USF Libraries
USF Digital Collections

Lessons learned about boys' and girls' mathematical problem solving

MISSING IMAGE

Material Information

Title:
Lessons learned about boys' and girls' mathematical problem solving the solution processes, performance, linguistic explanations, self-efficacy, and self-assessment of fifth-grade students of varying reading and mathematics abilities
Physical Description:
Book
Language:
English
Creator:
Hunsader, Patricia D
Publisher:
University of South Florida
Place of Publication:
Tampa, Fla
Publication Date:

Subjects

Subjects / Keywords:
Mathematics
Problem solving
Gender
Reading ability
Mathematics ability
Self-efficacy
Self-assessment
Self-regulation
Dissertations, Academic -- Early Childhood Education -- Doctoral -- USF
Genre:
bibliography   ( marcgt )
theses   ( marcgt )
non-fiction   ( marcgt )

Notes

Abstract:
ABSTRACT: The purpose of this non-experimental, causal-comparative study was to examine how gender, reading ability, and mathematics ability differentially moderated students' mathematical problem-solving processes, linguistic explanations of those solution processes, achievement on a mathematical problem-solving test, self-efficacy on a self-reported rating scale, and self-assessment on a self-reported rating scale. The investigation addressed five research questions. First, to what extent are students' mathematical problem-solving processes related to gender? Second, to what extent is children's performance in mathematical problem solving related to gender, reading ability, and mathematics ability? Third, to what extent is the quality of students' linguistic explanations of their problem-solving processes related to gender, reading ability and mathematics ability? Fourth, to what extent is the relationship between students' feelings of self-efficacy when commencing work on a ^mathematical problem and their actual performance related to gender, reading ability and mathematics ability? Finally, to what extent is students' ability to assess their own performance on mathematical problem-solving tasks related to gender, reading ability, and mathematics ability? After being briefed in the use of the self-efficacy and self-assessment reporting scales, the students from 16 fifth-grade classrooms were tested with a 12-item mathematical problem-solving test. The final sample consisted of 237 students, 129 boys and 108 girls. All student responses were scored for performance and linguistic explanation using holistic rubrics, and were coded according to the solution process employed. The results indicate that gender does not play a significant role in students' choice of problem-solving processes. As expected, mathematics ability was significantly related to performance as was reading ability. Gender was not found to be a significant predictor of performance.^^ Reading ability and mathematics ability were both strongly related to the quality of students' linguistic explanations of their problem-solving processes, but gender was not. Boys consistently exhibited higher levels of self-efficacy, but girls were more accurate in their self-efficacy feelings. Reading ability was also found to be a significant predictor of the accuracy of students' self-efficacy feelings, but mathematics ability was not. Reading ability was found to be the strongest predictor of the accuracy of students' self-assessment, with gender also showing a significant relationship.
Thesis:
Dissertation (Ph.D.)--University of South Florida, 2005.
Bibliography:
Includes bibliographical references.
System Details:
System requirements: World Wide Web browser and PDF reader.
System Details:
Mode of access: World Wide Web.
Statement of Responsibility:
by Patricia D. Hunsader.
General Note:
Title from PDF of title page.
General Note:
Document formatted into pages; contains 482 pages.
General Note:
Includes vita.

Record Information

Source Institution:
University of South Florida Library
Holding Location:
University of South Florida
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
aleph - 001911226
oclc - 173666306
usfldc doi - E14-SFE0001185
usfldc handle - e14.1185
System ID:
SFS0025506:00001


This item is only available as the following downloads:


Full Text
xml version 1.0 encoding UTF-8 standalone no
record xmlns http:www.loc.govMARC21slim xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.loc.govstandardsmarcxmlschemaMARC21slim.xsd
leader nam Ka
controlfield tag 001 001911226
003 fts
005 20071004145437.0
006 m||||e|||d||||||||
007 cr mnu|||uuuuu
008 071004s2005 flu sbm 000 0 eng d
datafield ind1 8 ind2 024
subfield code a E14-SFE0001185
040
FHM
c FHM
035
(OCoLC)173666306
049
FHMM
090
LB1139.23 (ONLINE)
1 100
Hunsader, Patricia D.
0 245
Lessons learned about boys' and girls' mathematical problem solving :
b the solution processes, performance, linguistic explanations, self-efficacy, and self-assessment of fifth-grade students of varying reading and mathematics abilities
h [electronic resource] /
by Patricia D. Hunsader.
260
[Tampa, Fla] :
University of South Florida,
2005.
3 520
ABSTRACT: The purpose of this non-experimental, causal-comparative study was to examine how gender, reading ability, and mathematics ability differentially moderated students' mathematical problem-solving processes, linguistic explanations of those solution processes, achievement on a mathematical problem-solving test, self-efficacy on a self-reported rating scale, and self-assessment on a self-reported rating scale. The investigation addressed five research questions. First, to what extent are students' mathematical problem-solving processes related to gender? Second, to what extent is children's performance in mathematical problem solving related to gender, reading ability, and mathematics ability? Third, to what extent is the quality of students' linguistic explanations of their problem-solving processes related to gender, reading ability and mathematics ability? Fourth, to what extent is the relationship between students' feelings of self-efficacy when commencing work on a ^mathematical problem and their actual performance related to gender, reading ability and mathematics ability? Finally, to what extent is students' ability to assess their own performance on mathematical problem-solving tasks related to gender, reading ability, and mathematics ability? After being briefed in the use of the self-efficacy and self-assessment reporting scales, the students from 16 fifth-grade classrooms were tested with a 12-item mathematical problem-solving test. The final sample consisted of 237 students, 129 boys and 108 girls. All student responses were scored for performance and linguistic explanation using holistic rubrics, and were coded according to the solution process employed. The results indicate that gender does not play a significant role in students' choice of problem-solving processes. As expected, mathematics ability was significantly related to performance as was reading ability. Gender was not found to be a significant predictor of performance.^^ Reading ability and mathematics ability were both strongly related to the quality of students' linguistic explanations of their problem-solving processes, but gender was not. Boys consistently exhibited higher levels of self-efficacy, but girls were more accurate in their self-efficacy feelings. Reading ability was also found to be a significant predictor of the accuracy of students' self-efficacy feelings, but mathematics ability was not. Reading ability was found to be the strongest predictor of the accuracy of students' self-assessment, with gender also showing a significant relationship.
502
Dissertation (Ph.D.)--University of South Florida, 2005.
504
Includes bibliographical references.
516
Text (Electronic dissertation) in PDF format.
538
System requirements: World Wide Web browser and PDF reader.
Mode of access: World Wide Web.
500
Title from PDF of title page.
Document formatted into pages; contains 482 pages.
Includes vita.
590
Adviser: James R. King, Ph.D.
653
Mathematics.
Problem solving.
Gender.
Reading ability.
Mathematics ability.
Self-efficacy.
Self-assessment.
Self-regulation.
690
Dissertations, Academic
z USF
x Early Childhood Education
Doctoral.
773
t USF Electronic Theses and Dissertations.
4 856
u http://digital.lib.usf.edu/?e14.1185



PAGE 1

Lessons Learned About Boys and Girls Mathematical Problem Solving: The Solution Processes, Performance, Lingui stic Explanations, Self-Efficacy, and Self-Assessment of Fifth-Grade Students of Varying Reading and Mathematics Abilities by Patricia D. Hunsader A dissertation in partial fulfillment of the requirement s for the degree of Doctor of Philosophy Department of Childhood Education College of Education University of South Florida Co-Major Professor: James R. King, Ed. D. Co-Major Professor: Deni sse R. Thompson, Ph. D. Jeffrey D. Kromrey, Ph. D. Susan Homan, Ph. D. Gladis Kersaint, Ph. D. Date of Approval: June 6, 2005 Keywords: mathematics, problem solv ing, gender, reading ability, mathematics ability, self-efficacy, self-assessment, self-regulation Copyright 2005, Patricia D. Hunsader

PAGE 2

DEDICATION This work is dedicated to my family. To my husband and best friend Mike, your unfailing support and encouragement are what prompted me to begin this journey, and you have remained steadfast to the end. After nearly twenty-five years together you still listen to me go on and on and on. Thank you. To my children, Jared, Zach, Johnny, Tyler, and Alyssa, thanks for reminding me every day in all your little ways that there is mo re to life than work. Together, you make every day of my life a joyous adventure.

PAGE 3

ACKNOWLEDGEMENTS I would like to thank my committee fo r truly being my dissertation team. To Dr. King, thank you for your wisdom and encouragement. Your belief in me inspires me to keep reaching. To Dr Thompson, thank you for applying your phenomenal editing skills to so many vers ions of this manuscript. You have given so generously of your most precious resource, your time. Dr. Kromrey, for someone of such great intellect, you touc h me with your humility. Thank you for always being kind and gentle in your corre ctions. To Dr. Homan, without you, I would not have even begun this program. Thanks for being my advocate from the first day to the last. To Dr. Kersai nt, thank you for teac hing me so much. I could not imagine writing this literatur e review without the experience you provided. I see pieces of what I have gained from each of you throughout this manuscript. Each one of you has been a blessing to me.

PAGE 4

TABLE OF CONTENTS LIST OF TABLES................................................................................................vii LIST OF FIGURES...............................................................................................x ABSTRACT..........................................................................................................xi CHAPTER 1. INTRODUCTION...........................................................................1 Statement of the Problem..........................................................................8 Purpose of the Study.................................................................................9 Research Questions................................................................................10 Significance of the Study.........................................................................12 Limitations of the Study............................................................................13 Assumptions of the Study........................................................................15 Definitions................................................................................................16 Organization of the Manuscript................................................................18 CHAPTER 2. REVIEW OF LITERATURE.........................................................20 Gender Differences in Mathematics.........................................................20 Gender Differences in Mathematical Performance........................22 Studies Focused on Students in the United States............22 Cross-National Studies.......................................................25 Gender Differences in Test Item Performance..............................29 Item Format........................................................................29 Item Content.......................................................................33 Gender Differences in Strategy Use..............................................34 Gender Differences in Affect.........................................................37 Summary of Gender Differences in Mathematics..........................39 Reading in Problem Solving.....................................................................43 The Relationship Between Reading and Mathematics..................43 Challenges of Reading Mathematical Text....................................45 Complexity of Word Problems............................................45 Comparison to Other Forms of Text...................................46 Vocabulary.........................................................................48 Symbols..............................................................................49 Other Challenges...............................................................50 Readability....................................................................................52 i

PAGE 5

Reading Comprehension and Word Problems..............................54 Highlighted Research on Reading and Problem Solving...............57 Problem Wording Research...............................................57 Disabilities Research..........................................................60 Accommodations Research................................................65 Reading Comprehension Research...................................68 Gender Differences in Reading/Literacy.......................................70 Sources of Boys Underachievement.................................71 Highlighted Research on Gender Differences in Reading/Literacy.............................................................75 Summary of Reading Research....................................................77 Self-Regulated Learning (SRL)................................................................78 The Concept and Nature of SRL...................................................78 Individual Differences in SRL........................................................86 Classroom Applications of SRL.....................................................87 SRL in the Field of Mathematics...................................................89 Proficient Problem Solvers.................................................90 Less Proficient Problem Solvers.........................................91 Highlighted SRL Research Findings.............................................93 Summary of Self-Regulated Learning...........................................95 Self-Efficacy.............................................................................................96 Self-Efficacy Measurement and Relationship to Performance......98 Self-Efficacy Versus Self-Concept..............................................102 Self-Efficacys Role in Self-Regulation........................................103 Self-Efficacy and Math-Related Career Choices.........................104 Intervention Efforts for Performance and Career Choices...........106 Gender and Self-Efficacy............................................................107 Self-Efficacy Summary................................................................109 Self-Assessment....................................................................................112 Highlighted Research on Self-Assessment.................................114 Summary of Self-Assessment.....................................................117 Written Communication in Problem Solving...........................................119 Challenges and Benefits of Written Communication...................120 Metacognition in Writing..............................................................122 Writing as Assessment Data.......................................................123 Highlighted Research on Writing and Problem Solving...............125 Gender Differences in Writing.....................................................127 Summary of Writing.....................................................................129 How the Review of Literature Informed the Proposed Study.................131 CHAPTER 3. METHODS................................................................................133 Design....................................................................................................133 Population and Sample Selection..........................................................137 Participant Demographics......................................................................143 Participation Levels by School and Gender................................143 Treatment of the Data.................................................................144 ii

PAGE 6

Final Sample Demographics.......................................................147 Instrumentation......................................................................................151 Description of the Testing Instrument..........................................151 Measuring Self-Efficacy and Self-Assessment............................152 Development of the Test.............................................................153 Validity........................................................................................158 Reliability of the Data.............................................................................160 Cronbach Alpha.........................................................................160 Interrater Reliability Training and Scoring..................................161 Member Check Interviews..........................................................165 Problem-Solving Processes............................................168 Self-Efficacy....................................................................169 Self-Assessment.............................................................170 Data Collection......................................................................................171 Organization of Testing...............................................................172 Analysis of Data.....................................................................................175 Research Question One..............................................................175 Research Questions Two and Three...........................................179 Research Questions Four and Five.............................................183 Summary...............................................................................................185 CHAPTER 4. FINDINGS..................................................................................187 Question One: Findings for Problem-Solving Processes.......................187 Coding and Emergent Themes..................................................188 Descriptive Statistics..................................................................193 Inferential Statistics....................................................................195 Question Two: Findings for Mathematical Performance........................198 Descriptive Statistics..................................................................199 Descriptive Analysis by Question....................................199 Analysis of Overall Sample.....................................199 Analysis by Group...................................................201 Descriptive Analysis of Total Mathematics Performance Scores...................................................203 Correlations Between Mathematics Performance and Other Variables....................................................205 Inferential Statistics....................................................................209 Factorial ANOVA Results................................................209 Multiple Regression Analysis..........................................215 Introduction.............................................................215 Descriptive Statistics...............................................216 Multiple Regression Results...................................216 Assumptions...........................................................219 Conclusion..............................................................220 Question Three: Findings for Linguistic Explanations............................220 Descriptive Statistics..................................................................221 Descriptive Analysis by Question....................................221 iii

PAGE 7

Analysis of Overall Sample.....................................221 Analysis by Group...................................................223 Descriptive Analysis of Total Linguistic Explanation Scores..........................................................................224 Inferential Statistics....................................................................226 Factorial ANOVA Results................................................226 Multiple Regression Analysis..........................................231 Introduction.............................................................231 Multiple Regression Results...................................232 Assumptions...........................................................234 Conclusion..............................................................235 Qualitative Analysis of Students Linguistic Explanations...........236 Question Four: Findings for Self-Efficacy...............................................239 Descriptive Statistics..................................................................241 Descriptive Analysis of Total Self-Efficacy Scores..........241 Overall Analysis of Groups.....................................241 Analysis of Overall Sample by Question.................241 Analysis of Groups by Question..............................244 Descriptive Analysis of Self-Efficacy Mean Bias.............245 Overall Analysis of Groups.....................................245 Analysis of Overall Sample by Question.................248 Analysis of Groups by Question..............................249 Inferential Statistics....................................................................250 Factorial ANOVA Results................................................250 Multiple Regression Analysis..........................................255 Introduction.............................................................255 Multiple Regression Results...................................255 Assumptions...........................................................258 Conclusion..............................................................259 Qualitative Analysis of Students Rationale for Self-Efficacy Ratings....................................................................................259 Question Five: Findings for Self-Assessment........................................263 Descriptive Statistics..................................................................264 Descriptive Analysis of Total Self-Assessment Scores...264 Overall Analysis of Groups.....................................264 Analysis of Overall Sample by Question.................266 Analysis of Groups by Question..............................267 Descriptive Analysis of Self-Assessment Mean Bias......268 Overall Analysis of Groups.....................................268 Analysis of Overall Sample by Question.................271 Analysis of Groups by Question..............................273 Inferential Statistics....................................................................273 Factorial ANOVA Results................................................273 Multiple Regression Analysis..........................................277 Introduction.............................................................277 Multiple Regression Results...................................278 iv

PAGE 8

Assumptions...........................................................281 Conclusion..............................................................282 Teachers Assessment of Their Students Performance.............283 CHAPTER 5. SUMMARY, CONCLUSIONS, IMPLICATIONS, AND RECOMMENDATIONS.................................................................................284 Summary of the Study...........................................................................284 Conclusions and Implications................................................................293 Research Question One: Problem-Solving Processes..............293 Problem-Solving Process Conclusions...........................293 Problem-Solving Process Implications............................295 Research Question Two: Mathematical Performance...............296 Mathematical Performance Conclusions.........................296 Mathematical Performance Implications.........................298 Research Question Three: Linguistic Explanations...................299 Linguistic Explanation Conclusions.................................299 Linguistic Explanation Implications..................................301 Research Question Four: Self-Efficacy......................................302 Self-Efficacy Conclusions................................................302 Self-Efficacy Implications................................................304 Research Question Five: Self-Assessment...............................306 Self-Assessment Conclusions.........................................306 Self-Assessment Implications.........................................308 Recommendations for Practice..............................................................309 Recommendations for Future Research................................................317 REFERENCES.................................................................................................322 APPENDIX A. FCAT Mathematics Performance Scoring Rubric....................357 APPENDIX B. Amended Rubric for Scoring Linguistic Explanations..............359 APPENDIX C. Pilot Study Rubric for Scoring Linguistic Explanations............361 APPENDIX D. Sample Student Work from Pilot Study to Illustrate Scoring.................................................................................363 APPENDIX E. FCAT Achievement Levels......................................................366 APPENDIX F. Pilot Study...............................................................................368 APPENDIX G. Teacher Rating Sheet..............................................................390 APPENDIX H. Parental Informed Consent......................................................392 APPENDIX I. Informed Consent Cover Letter...............................................397 APPENDIX J. Problem Solving Test Instrument.............................................399 APPENDIX K. Item-Specific Rubrics for Mathematics Scoring.......................414 APPENDIX L Item-Specific Rubrics for Linguistic Explanation Scoring.........427 APPENDIX M. Solution Processes Identified for Each Question.....................440 APPENDIX N. Descriptive Statistics for Mathematics Performance by Question by Group............................................................454 APPENDIX O. Descriptive Statistics for Linguistic Explanations by Question by Group............................................................458 v

PAGE 9

APPENDIX P. Descriptive Statistics for Self-Efficacy by Question by Group..462 APPENDIX Q. Descriptive Statistics for Self-Efficacy Mean Bias by Question by Group............................................................466 APPENDIX R. Descriptive Statistics for Self-Assessment by Question by Group................................................................................470 APPENDIX S. Descriptive Statistics for Self-Assessment Mean Bias by Question by Group............................................................474 APPENDIX T. Multiple Regression Summary Tables Including Interactions..478 ABOUT THE AUTHOR............................................................................End Page vi

PAGE 10

LIST OF TABLES Table 1 Results of Mathematical Gender Differences Studies Grouped by Topic..........................................................40 Table 2 Results of Reading Studies Grouped by Topic.............................79 Table 3 Results of Self-Regulation in Mathematics Studies.......................96 Table 4 Results of Self-Efficacy Studies Grouped by Age of Participants...................................................................110 Table 5 Results of Self-Assessment Studies...........................................118 Table 6 Results of Writing Studies Grouped by Topic..............................130 Table 7 Categorization of Student Ability.................................................142 Table 8 Participation Levels of Boys and Girls by School........................144 Table 9 Cell Distribution of Students for Factorial ANOVA and Multiple Regression Analyses...................................................149 Table 10 Mean FCAT Scores for Sample Grouped by School and by Gender..........................................................................150 Table 11 Description of Test Questions Selected for the Study.................157 Table 12 Summary of Data Sources and Methods....................................176 Table 13 Categories of Themes and Problem-Solving Processes.............190 Table 14 Descriptive Statistics for Coding of Problem-Solving Process Themes.......................................................................194 Table 15 Chi-Square Results for Problem-Solving Process Themes.........196 Table 16 Problem-Solving Process Themes for the Pencil Problem..........197 vii

PAGE 11

Table 17 Descriptive Statistics for Mathematics Performance Scores.......200 Table 18 Total Mathematics Performance Scores by Group......................203 Table 19 Teacher Rating of Student Preparedness by Question...............207 Table 20 Factorial ANOVA Results for Total Mathematics Performance Scores.......................................................................................212 Table 21 Summary of Multiple Regression Analysis for Mathematics Performance........................................................217 Table 22 Descriptive Statistics for Linguistic Explanation Scores..............221 Table 23 Total Linguistic Explanation Scores by Group.............................224 Table 24 Factorial ANOVA Results for Total Linguistic Explanation Scores.......................................................................................229 Table 25 Summary of Multiple Regression Analysis for Linguistic Explanations..............................................................232 Table 26 Total Self-Efficacy Scores by Group...........................................242 Table 27 Descriptive Statistics for Self-Efficacy Scores.............................243 Table 28 Self-Efficacy Mean Bias Scores by Group..................................245 Table 29 Self-Efficacy Mean Bias Scores by Question..............................249 Table 30 Factorial ANOVA Results for Self-Efficacy Mean Bias Scores....253 Table 31 Summary of Multiple Regression Analysis for Self-Efficacy Mean Bias............................................................256 Table 32 Total Self-Assessment Scores by Group....................................265 Table 33 Descriptive Statistics for Self-Assessment Scores......................266 Table 34 Self-Assessment Mean Bias Scores by Group...........................269 Table 35 Self-Assessment Mean Bias Scores by Question.......................271 Table 36 Factorial ANOVA Results for Self-Assessment Mean Bias Scores.......................................................................................275 viii

PAGE 12

Table 37 Summary of Multiple Regression Analysis for Self-Assessment Mean Bias.....................................................279 Table 38 Categorization of Pilot Students..................................................374 Table 39 Pilot Study Summary of Performance Rubric Scores by Item......................................................................................382 Table 40 Summary of Pilot Data by Gender...............................................388 Table 41 Summary of Pilot Data by Item....................................................389 ix

PAGE 13

LIST OF FIGURES Figure 1 Box and Whisker Plot of Total Performance Scores Grouped by Gender, Mathematics Ability, and Reading Ability...................................................................205 Figure 2 Total Mathematics Score Interaction of Reading Ability and Gender...............................................................................214 Figure 3 Box and Whisker Plot of Total Linguistic Explanation Scores Grouped by Gender, Mathematics Ability, and Reading Ability...................................................................225 Figure 4 Box and Whisker Plot of Self-Efficacy Mean Bias Scores Grouped by Gender, Mathematics Ability, and Reading Ability...................................................................247 Figure 5 Box and Whisker Plot of Self-Assessment Mean Bias Scores Grouped by Gender, Mathematics Ability, and Reading Ability...................................................................270 x

PAGE 14

Lessons Learned About Boys and Girls Mathematical Problem Solving: The Solution Processes, Performance, Linguistic Explanations, Self-Efficacy, and Self-Assessment of Fifth-Grade Students of Varying Reading and Mathematics Abilities Patricia D. Hunsader ABSTRACT The purpose of this non-experimental, causal-comparative study was to examine how gender, reading ability, and mathematics ability differentially moderated students mathematical problem-solving processes, linguistic explanations of those solution processes, achievement on a mathematical problem-solving test, self-efficacy on a self-reported rating scale, and self-assessment on a self-reported rating scale. The investigation addressed five research questions. First, to what extent are students mathematical problem-solving processes related to gender? Second, to what extent is childrens performance in mathematical problem solving related to gender, reading ability, and mathematics ability? Third, to what extent is the quality of students linguistic explanations of their problem-solving processes related to gender, reading ability and mathematics ability? Fourth, to what extent is the relationship between students feelings of self-efficacy when commencing work on a mathematical problem and their actual performance related to gender, reading ability and mathematics ability? Finally, to what extent is students ability to assess their xi

PAGE 15

own performance on mathematical problem-solving tasks related to gender, reading ability, and mathematics ability? After being briefed in the use of the self-efficacy and self-assessment reporting scales, the students from 16 fifth-grade classrooms were tested with a 12-item mathematical problem-solving test. The final sample consisted of 237 students, 129 boys and 108 girls. All student responses were scored for performance and linguistic explanation using holistic rubrics, and were coded according to the solution process employed. The results indicate that gender does not play a significant role in students choice of problem-solving processes. As expected, mathematics ability was significantly related to performance as was reading ability. Gender was not found to be a significant predictor of performance. Reading ability and mathematics ability were both strongly related to the quality of students linguistic explanations of their problem-solving processes, but gender was not. Boys consistently exhibited higher levels of self-efficacy, but girls were more accurate in their self-efficacy feelings. Reading ability was also found to be a significant predictor of the accuracy of students self-efficacy feelings, but mathematics ability was not. Reading ability was found to be the strongest predictor of the accuracy of students self-assessment, with gender also showing a significant relationship. xi i

PAGE 16

CHAPTER 1 INTRODUCTION Unlike times past when a teachers judgment alone was regarded as a complete and accurate assessment of student achievement, todays emphasis has shifted away from classroom evidence of learning to basing conclusions on scores of standardized tests. Although future high-stakes assessments in some states will begin to incorporate science and other content areas, the emphasis to date has strictly been on reading, writing, and mathematics (Hunsader, 2002a). Reading and mathematics cannot be separated in the testing arena because mathematical standards, and the tests that are designed to assess them, have de-emphasized strict computation in favor of contextualized word problems. These word problems rely on students ability to read and comprehend the problem situation before solving (National Council of Teachers of Mathematics [NCTM], 2000). Before the advent of the NCTM Standards (1989, 2000), the traditional mathematics classroom with its traditional computational assessments stood alone as a subject. However, current practices in the reform-influenced mathematics classroom connect mathematics to all subjects, and the ability to read, reason, and communicate is essential to success. The No Child Left 1

PAGE 17

Behind legislation calls for more testing than ever before, and the focus on standards-based testing has resulted in an increase in the proportion of problems set in real world contexts requiring linguistic justification of answers. In todays high-stakes testing environment, success in mathematics depends heavily on the integration of skills and processes previously confined to literacy (Adams, 2003). No longer can the teacher of mathematics treat the subject as an island, alienated from other content areas. The reforms encouraged by the National Council of Teachers of Mathematics in their standards publications (NCTM 1989, 2000) call for de-emphasizing the teaching of discrete skills, focusing instead on a more integrated problem-solving approach. Unless students can solve problems, the facts, concepts, and procedures they know are of little use (NCTM, 2000, p. 182). Problem solving tasks differ in quality from simple exercises. Simple exercises are generally presented in a strictly symbolic format, requiring only the selection and application of an effective algorithm to complete the solution process. Word problems for which the student has developed such a high level of proficiency that they have become routine would also be considered simple exercises. True problems, however, not only require students to read and comprehend the problem situation, but also necessitate that students analyze alternative solution paths and select a productive strategy, then finally communicate either verbally, in writing, or both, the steps that were used to solve the problem. Whereas exercises are characterized by one correct answer generally arrived at through one specific algorithm, true problems may have multiple correct answers, or 2

PAGE 18

multiple ways of arriving at a single correct answer. In contrast to exercises, problems may require the student to make judgments about procedures, monitor their thinking throughout the solution process, and explain their thinking in writing (Hong, 1995; Krulik & Rudnick; 1987; McIntosh & Draper, 1996; Schoenfeld, 1989). The mathematics reform movement draws attention to the supportive relationship between mathematics and reading, and purports that they should be taught and learned together. The National Council of Teachers of Mathematics argues that knowing mathematics is doing mathematics (1989). Adams (2003) follows that doing mathematics is reading mathematics (p. 794). Because it is the words, symbols, and numerals that give mathematics its substance and framework, these same words, symbols, and numerals must be used to communicate the problem situation to students so that students can use them to perform procedures, solve problems, and explain processes. Reading, in this sense, acts as a gatekeeper to problem solving. In support of this notion, correlational studies show that there is a substantial relationship between childrens mathematical and reading performance (Flem & Sovik, 1993; Hecht, Torgeson, Wagner, & Rashotte, 2001; Sovik, Frostrad, & Lie, 1994). Reform-based problem solving, involving contextualized problems without a prescribed algorithm, requires that students be self-regulated learners who are metacognitively, motivationally, and behaviorally active participants in their own learning (Zimmerman, 1990, p. 4). Metacognitively, self-regulated learners set their own goals, monitor their progress, and self-evaluate throughout the learning 3

PAGE 19

process. In terms of motivation, self-regulated learners feel confident about their ability to be successful, and are intrinsically interested in learning. Behaviorally, self-regulated learners actively work to create environments that are conducive to learning (Zimmerman, 1990). Students level of self-efficacy and their ability to assess their own work with some level of accuracy are important elements in determining the extent to which they successfully engage in self-regulatory strategies. The current economy of the United States requires that workers are able to apply their mathematical knowledge to solve novel and authentic problems (Kramarski, Mevarech, & Arami, 2002). Unfortunately, large numbers of students of all ages fail to demonstrate proficiency in problem solving (Neef, Nelles, Iwata, & Page, 2003), and do not have the skills necessary to solve mathematics-related problems in their everyday lives or in future work situations (Bottge, 1999). One way to bring emphasis in the classroom to these critical skills is for states to continue to push for educational accountability through high-stakes assessments that measure problem-solving skills. As the reform movement and high-stakes testing change the face of mathematics, it is necessary to ensure that all children, regardless of personal characteristics, are still provided opportunities to be successful in mathematical problem solving (NCTM, 2000). This does not imply that all students should receive the same instruction, but that efforts should be made to provide strategic instruction that affords all children the opportunity to succeed mathematically. Unfortunately, girls and boys have not historically reached the same levels of 4

PAGE 20

achievement in mathematics. In past generations, where differences existed between girls and boys, girls scored higher than boys on lower level basic skills while boys outscored girls on higher level reasoning skills, including problem solving (Fennema, 1974). Maccoby (1966) found that gender differences in reasoning ability existed as early as the upper elementary grades in favor of boys. Additionally, these differences continued through high school with boys consistently outperforming girls in measures of higher-level mathematical reasoning. In recent years, the disturbing trend of boys underachievement in literacy has been gaining attention. Studies show that boys account for 75 to 85 percent of those labeled at-risk for poor achievement progress in literacy (Rowe, 2000), and boys are three to five times more likely than girls to be labeled reading/learning disabled (Young & Brozo, 2001). As reading and writing become more crucial in the mathematics classroom, this development may impact boys performance in high-level reasoning activities, such as problem solving, that rely on literacy skills. Business Week (Conlin, 2003) reports that from kindergarten to graduate school, girls now outperform boys in reading and overall grades, and are catching up to boys in mathematics. Although the connection between performance in reading and mathematics has not been established here, it certainly warrants further inspection. The research relating reading to problem solving is fairly extensive, but critical gaps still remain. For example, in most studies relating reading ability to problem-solving ability (Fan, Mueller, & Marini, 1994; Fuchs, Fuchs, Eaton, 5

PAGE 21

Hamlett, & Karns, 2000; Helwig, Rozek-Tedesco, Tindal, Heath, & Almond, 1999; Helwig, Rozek-Tedesco, & Tindal, 2002; Johnson, 2000; Jordan, Kaplan, & Hanich, 2002; Jordan & Montani, 1997, Tindal, Heath, Hollenback, Almond, & Harniss, 1998) the problems were read aloud to the students. Although this method does give insight into how accommodations may impact poor readers, it does not allow for any conclusions to be drawn about poor readers independent performance on problem-solving tasks. Another weakness in the research is that most studies have either used multiple examples of the same problem type in their tests (Bernardo, 1999; Fan, Mueller, & Marini, 1994; Leong & Jerred, 2001), or have administered tests containing questions significantly below the students reading and mathematics level (Sovik, Frostad, & Heggberget, 1999). The result is a wealth of data that is either severely limited in its generalizability, or does not allow the researcher to distinguish differences in the problem-solving performance of poor readers versus good readers. Given the trend of boys underachievement in literacy, the most striking gap in the research is the dearth of studies that include gender as a variable in the relationship between reading ability and problem solving. The NCTM Standards (2000) have also resulted in an increase in the use of writing activities in mathematics, and students are now often expected to explain their thought processes while solving problems in the classroom and on high-stakes tests (Gurganus & Del Mastro, 1998). This writing may lead to improved conceptual understanding (Quinn & Wilson, 1997), and may serve to reveal misconceptions and gaps in students thought processes that may not be 6

PAGE 22

exposed through traditional assessments (Krulik & Rudnick, 1994; MacGregor, 1990). However, there is scant research that seeks to determine the differential impact on boys and girls of including linguistic explanations as a criterion in the assessment of problem solving. This is cause for concern because in the elementary grades girls tend to score higher than boys on measures of writing proficiency (Knudson, 1995). Gender-related differences in general problem solving ability have been empirically demonstrated, but the focus of much of the research has been on the product of problem solving (the answer) rather than the solution process (the steps taken to arrive at the answer). When the focus is strictly on product, gender differences in the elementary and middle grades are not evident (Fennema, Carpenter, Jacobs, Franke, & Levi, 1998; Tarte & Fennema, 1995). However, when examining the solution processes children use during problem solving, gender differences appear as early as first grade (Carr, Jessup, & Fuller, 1999; Fennema et al., 1998). Gender related differences in problem-solving processes beyond the primary grades have received very little attention in the literature. When the solution processes have been studied, they have been examined in relation to boys and girls ability to follow a prescribed problem-solving heuristic (Zambo, 1990), rather than in an attempt to uncover inherent differences in their choice of problem-solving processes. The relationship between affect and mathematics achievement has garnered much attention in the research due to the prevalence of affective differences between boys and girls. Girls typically have less favorable beliefs 7

PAGE 23

about their mathematical ability than boys, and boys tend to exhibit more confidence than girls in their mathematical ability, regardless of their actual achievement (Stipek & Gralinski, 1991; Seegers & Boekaerts, 1996; Vermeer, Boekaerts, & Seegers, 2000). None of these studies, however, attempted to relate boys and girls feelings of self-efficacy to actual performance on a per-item basis, even though high self-efficacy has been shown to impact performance positively (Maddux, 2002; Pajares, 1996). Also, none of the studies attempted to gauge students ability to assess their own performance on individual problems. Statement of the Problem The problem investigated in this study was the lack of knowledge about how gender, reading ability, and mathematics ability are related to childrens problem solving processes, their proficiency in providing linguistic explanation of those solution processes, the accuracy of their feelings of self-efficacy, and their ability to assess their own work. To understand more clearly the root of differences between boys and girls, researchers must pay as much attention to how boys and girls think and the processes they use to solve problems as they do to achievement scores. As literacy becomes more integrated into mathematics instruction and assessment, the question arises as to whether this change will have the same impact on girls and boys. Whereas boys have historically outperformed girls in high-level mathematics involving multi-step problem solving and reasoning (National Assessment of Educational Progress [NAEP], 1986), the recent trend of boys underachievement in reading (Rowe, 8

PAGE 24

2000; Young & Brozo, 2001) may be a precursor to a reversal of this tendency. In addition, the increased use of contextualized problems calls for greater use of self-regulatory skills. How boys and girls respond to problem situations in their feelings of self-efficacy and their ability to self-assess their work may serve to further exacerbate gender differences. In the current educational environment, students performance on high-stakes testing is used to make important decisions about their academic future. Because mathematical problem solving is an important element of these high-stakes tests, researchers must work to understand how gender, reading ability, and mathematics ability impact students mathematical problem solving behavior. Purpose of the Study The purpose of this study was to examine how gender, reading ability, and mathematics ability differentially moderated students mathematical problem-solving processes, linguistic explanations of those solution processes, achievement on a mathematical problem-solving test, self-efficacy on a self-reported rating scale, and self-assessment on a self-reported rating scale. Previous research has shown that boys and girls think differently when solving problems, but because most of this work has been conducted with first-, second-, and third-grade children, little is known about how these differences may manifest themselves in complex problem-solving situations in the upper elementary grades. The problems that were used in this study were situated in real world contexts, and included both a range of difficulty and a variety of 9

PAGE 25

possible operations. In keeping with current findings related to the relationship between reading and problem solving, this study differentiated reading ability as an independent variable. Because reform efforts call for students to explain or justify their thinking in writing, all test questions instructed students to provide a linguistic explanation of their work, and those explanations were holistically scored and analyzed. Boys and girls differ in their feelings of self-efficacy, and because self-efficacy has been shown to impact performance, students were provided with a means of recording their level of self-efficacy for every problem tested. Finally, as students self-regulatory behaviors are critical to problem solving, and their ability to assess their own work is critical to self-regulation, students were also provided with a means of assessing their performance for every problem tested. Together, the information garnered from these elements of the study provide a unique contribution to the body of knowledge about gender differences in problem solving. Research Questions Very little research has examined gender differences in students problem solving processes, performance, self-efficacy, self-assessment, and linguistic explanations, and the impact that their reading and mathematics ability have on these issues. For this reason, the research questions for this study were as follows: 1) To what extent are students choices of mathematical problem-solving processes related to gender? 10

PAGE 26

2) To what extent is childrens performance in mathematical problem solving related to gender, reading ability as measured by teacher ratings and Florida Comprehensive Assessment Test (FCAT) reading scores, and mathematics ability as measured by teacher ratings and FCAT mathematics scores? 3) To what extent is the quality of students linguistic explanations of their problem-solving processes, as measured by a holistic rubric, related to gender, reading ability as measured by teacher ratings and FCAT reading scores, and mathematics ability as measured by teacher ratings and FCAT mathematics scores? 4) To what extent is the relationship between students feelings of self-efficacy when commencing work on a mathematical problem, as measured by a self-reported rating scale, and their actual performance on a mathematical problem solving test as measured by a holistic scoring rubric related to gender, reading ability as measured by teacher ratings and FCAT reading scores, and mathematics ability as measured by teacher ratings and FCAT mathematics scores? 5) To what extent is students ability to assess their own performance on mathematical problem solving tasks as measured by a self-reported rating scale related to gender, reading ability as measured by teacher ratings and FCAT reading scores, and mathematics ability as measured by teacher ratings and FCAT mathematics scores? 11

PAGE 27

Significance of the Study The primary goal of mathematics instruction is to teach students to solve practical problems (NCTM 1989, 2000) that will help prepare them for full participation in society (Miller, 1993). Solving word problems is a difficult process because it entails not only mathematics skills, but also reading comprehension in the translation of words to symbols and then into corresponding operations (Neef, Nelles, Iwata, & Page, 2003). NCTM (2000) calls for equity in mathematics education with high expectations and strong support for all students: girls and boys. Yet, boys and girls have historically differed in their problem-solving performance, with current statistics showing that boys are beginning to lag behind girls in literacy skills. Additionally, the literacy skill of reading comprehension has proven to be a critical element of the problem solving process. For these reasons, teachers, teacher educators, and curriculum developers who understand the differential relationship between reading ability and boys and girls problem-solving processes will be better equipped to provide mathematics education that is relevant to todays world and equitable to both genders than teachers who do not understand this relationship. Also, because reform-based assessments and high-stakes tests often require linguistic explanations of problem solving processes, it is important to examine whether potential differences in boys and girls ability to provide these explanations may have a differential relationship to their achievement on these tests when measured using a holistic scoring rubric. 12

PAGE 28

Self-efficacy and self-assessment have been proven to impact students self-regulatory behaviors and performance in mathematics (Hassmen & Hunt, 1994; Koivula, Hassmen, & Hunt, 2001; Maddux, 2002; Pajares, 1996), and boys and girls differ in both their feelings of self-efficacy and their ability to self-assess (Pajares & Miller, 1997; Pallier, 2003, van Kraayenoord & Paris, 1997; Zimmerman & Martinez-Pons, 1990). The implication is that teaching mathematics so that students learn mathematics is a much more complex issue than content alone. If indeed girls and boys performance is differentially tied to these elements of self-regulation, teachers of mathematics, mathematics teacher educators, and even the parents of mathematics students will be well served by any new information that helps elucidate the relationships between self-efficacy, self-assessment, gender, and ability. Limitations of the Study The following list is provided to acknowledge and clarify the limitations of this study that may impact the generalizability of the findings: 1. Number and Operations is the only mathematics strand tested through the 12-item instrument that was developed for this study. If the instrument also included items related to algebra, geometry, measurement, and/or data analysis and probability, the study may net different results, and may be generalized to the field of mathematics education differently. 2. Participation in the study was voluntary, which may have resulted in the use of a biased sample of the population. However, because motivation is 13

PAGE 29

an important element in students willingness to respond carefully to all 12 extended-response items, the use of non-volunteers could have feasibly falsified the results more than the use of volunteers. 3. To provide complete coverage of the Number and Operations strand through extended response questions would have resulted in an unreasonably long test. The researcher acknowledges that the 12-items included in the test did not measure all possible dimensions of Number and Operations. 4. The time required to analyze data from the testing event precluded the possibility of immediately performing the targeted follow-up interviews of students. The time lapse between the actual testing event and the interviews may have limited students ability to recall what they were thinking during the problem solving process. 5. The smiley-face Likert scales used by students to record their level of self-efficacy and self-assessment for each problem limited their ability to express a full range of thoughts regarding self-efficacy and self-assessment. 6. Complete FCAT scores were unavailable for forty-four students in the sample, thereby reducing the statistical power of the study. 14

PAGE 30

Assumptions of the Study The following list serves to clarify the premises under which the study was conducted: 1. The researcher, in the process of test administration, did nothing to bias the results of the evaluation procedure. 2. Childrens responses to the 12-item extended response test provided a sufficient level of information about students thinking to allow for an accurate categorization of their problem-solving processes. 3. The structure of the targeted follow-up interviews, allowing students to review their own work before making statements about what they were thinking during the problem-solving process, allowed for some conclusions to be drawn as to what students were thinking during the actual testing event. 4. The use of previous standardized test scores together with teacher evaluation to categorize students as high, middle, or low readers, and high, middle, or low mathematics students, provided a sufficient level of accuracy to draw conclusions about these groups. 5. The 12 items chosen for the test instrument from fourth-grade 1992, 1996, and 2003 National Assessment of Educational Progress (NAEP) released items and fifth-grade Florida Comprehensive Assessment Test (FCAT) released items were appropriate for testing students in the first half of their fifth-grade year. 15

PAGE 31

6. Although, based on the pilot study, the sample did consist of an exactly even number of boys and girls, there were sufficient numbers of each to be able to draw conclusions about their problem-solving performance, solution processes, self-efficacy, self-assessment, and linguistic explanations. Definitions The following is a list of the terms and operational definitions that were used throughout this study: Attribution: The reasons students give for their success or failure in an endeavor. Calibration: The degree to which students judgments of their capability reflect their actual competence (Pajares & Miller, 1997). Confidence bias: a systematic error of judgment made by individuals when they assess the correctness of their responses to questions relating to intellectual or perceptual problems (Pallier et al., 2002). Gender: Although some believe the dichotomization of gender to be artificial, and the nature of gender to be socially constructed, for the purposes of this study, gender is limited to the categories of male and female as determined strictly by genetics. Linguistic Explanations: The text provided by the student in an effort to explain her/his mathematical work on each problem. In the assessment process, symbols and numerals embedded in textual explanations were treated as text. 16

PAGE 32

Mathematical problem: A task for which the student has no prescribed or memorized rules or methods for solving (Hebert et al., 1996). Problem-solving performance: The level of accuracy and completeness of problem-solving processes and solutions as measured by the Florida Comprehensive Achievement Test (FCAT) 4-point rubric. Problem solving processes: The steps taken to solve a problem as evidenced by students written work on the problem-solving test and through discussion during the targeted follow-up interviews. Self-assessment: A judgment of the success of ones performance on a task. In this regard, self-assessment might be considered to play an essential role in students progress towards autonomy (Laveault, Leblanc, & Leroux, 1999). For this study, students self-assessment is recorded by circling one of five faces on the smiley-face Likert scale presented below the workspace for each question. Self-efficacy: A judgment of ones ability to perform a task within a specific domain (Bandura, 1997). For this study, students level of self-efficacy is recorded by circling one of five faces on the smiley-face Likert scale presented after the problem text for each question. Self-efficacy expectations: A persons belief concerning his or her ability to perform a given task or behavior successfully. Self-regulated learning: The outcome of choosing to engage in self-directed metacognitive, cognitive, affective, and behavioral processes and skills (McCombs & Marzano, 1990). 17

PAGE 33

Statistically significant: A given result has less than a five percent probability of being a function of chance (Ary, Jacobs, & Razavieh, 1996). Triadic theory of social cognition: Students efforts at self-regulation are determined by three factors: their personal processes, the environment, and their behavior (Bandura, 1986). Organization of the Manuscript This manuscript has been organized into five chapters. Chapter 1 identified the problem and places it in context for the study. The research questions, limitations, assumptions, and definitions were also included. Chapter 2 reviews the literature relevant to the research questions. Research strands include (a) gender differences in mathematics, (b) reading in problem solving, and gender differences in reading/literacy, (c) elements of self-regulation, specifically self-efficacy and self-assessment, and their relationship to mathematics, and (d) written communication in problem solving, including gender differences in writing. At the end of each major section of the literature review is a table that summarizes the research studies discussed in the section. Chapter 3 presents the methods that were used to conduct the study. The development process for the 12-item extended-response test is described, as well as the procedures for acquiring research participants, administering the test, conducting the targeted follow-up interviews, and analyzing the data. Chapter 4 summarizes the findings of the study. The descriptive statistics and findings derived from inferential data analysis are reported. Chapter 5 presents the conclusions of the 18

PAGE 34

study, the resulting implications of the study results, and the recommendations for classroom practice and future research. 19

PAGE 35

CHAPTER 2 REVIEW OF LITERATURE The purpose of this study was to examine how gender, reading ability, and mathematics ability differentially moderated students mathematical problem-solving processes, linguistic explanations of those solution processes, achievement on a mathematical problem-solving test, self-efficacy on a self-reported rating scale, and self-assessment on a self-reported rating scale. This chapter has four research strands presented in four sections. The first section reviews gender differences in mathematics. The second section focuses on reading as it relates to problem solving, and examines gender differences in reading ability. The third section reviews self-regulation as a concept, how it relates to mathematics, and how self-efficacy and self-assessment, as elements of self-regulation, are related to mathematics. The fourth section provides an overview of written communication in problem solving, followed by a review of the literature on gender differences in writing. Gender Differences in Mathematics NCTM (2000), in its publication Principles and Standards for School Mathematics, supports the belief that all children, regardless of personal 20

PAGE 36

characteristics or background, should have opportunities to study and learn mathematics. Unfortunately, girls have historically not reached the same levels of achievement in mathematics as boys. Although girls are currently outpacing boys in reading, are taking more advanced courses in high school, and are reporting higher educational aspirations (American Association of University Women, 2002; Young & Brozo, 2001), they continue to lag behind boys in enrollment and achievement in advanced mathematics courses (Carr, Jessup, & Fuller, 1999). It is important that our educational system do everything in its power to ensure that girls and boys are equally prepared to meet the mathematical challenges they will face in the workplace and in society (Croom, 1997). The purpose of this section of the literature review is to examine research findings regarding gender differences in students mathematical lives, including the topics of performance, item format and content, strategy use, affect, and the broader context provided by cross-national studies. Through understanding the factors that generate gender differences, teacher educators will be equipped to incorporate gender-specific pedagogical principles into coursework for pre-service teachers of mathematics. New teachers will in turn be prepared to provide instruction that allows both boys and girls to be mathematically successful. Although some of the studies of gender differences do not find statistically significant differences in achievement until the high school years, this review was conducted under the premise that students mathematical experiences in the earlier grades influence the development of the differences 21

PAGE 37

that are found in later grades. It is, therefore, important not only to examine research findings that convey the extent of gender differences in achievement throughout childrens school experience, but also to examine differences in non-achievement based factors that may serve to shape and/or exacerbate achievement differences. Gender Differences in Mathematical Performance Studies Focused on Students in the United States The study of gender differences in mathematical performance has captured a fair amount of attention in the literature since the early 1970s. In 1974, Fennema reviewed published studies and concluded that No significant differences between boys and girls mathematics achievement were found before boys and girls entered elementary school or during early elementary years. In upper elementary and early high school years significant differences were not always apparent. However, when significant differences did appear they were more apt to be in the boys favor when higher-level cognitive tasks were being measured and in the girls favor when lower-level cognitive tasks were being measured (pp. 136-137). No conclusions were made in this review about high school learners because of the limited research available on students of that age. Only a few years later though, Fennema and Carpenter (1981) reported that the 1978 NAEP results indicated that there were gender differences in performance at the high school level with males outperforming females, especially on high cognitive-level tasks. 22

PAGE 38

In a meta-analysis of 100 studies conducted between 1967 and 1987, Hyde, Fennema, & Lamon (1990) concluded that there was a slight female superiority in elementary and middle school, a moderate male superiority in high school, and a larger male advantage in college and later adulthood. They stress that their findings in no way give credence to the global conclusion that boys excel in mathematics, because the overall differences were small and decreasing. In fact, the magnitude of the differences found in studies conducted in 1974 or after was less than half that found in earlier studies. They also caution against making any general statements about gender differences in performance because the pattern of differences that they found was complex, with gender differences varying according to task and age. Whereas girls are superior to boys in computation, there do not appear to be any gender differences in understanding mathematics concepts. Although boys and girls differ in their problem solving abilities, the differences that favor boys do not appear until the high school years. Although Park, Bauer, and Sullivans (1998) study viewed a much narrower segment of the population than the meta-analysis, namely high-ability students, their findings corroborate and bring current some of what Hyde et al. (1990) found in their meta-analysis. Just as in the Hyde et al. meta-analysis, girls outperformed boys in mathematics computation, and boys outperformed girls on problems requiring application of concepts. In contrast to the meta-analytic findings, they found that boys scores on items testing mathematics concepts were higher than girls. Thompson, Strackbein, and Williams (1992) 23

PAGE 39

sought to determine whether specific intervention efforts could alter gender differences in achievement. In an experimental study using intervention to improve students mental computation, they found that while all students benefited from the eight-week training on mental computation, the intervention had a much greater impact on girls scores than boys in both computation and problem solving. This study shows the potential of intervention efforts using appropriate instructional strategies for achieving balance in boys and girls mathematics performance, thereby providing incentive for further study. Using data from the National Longitudinal Survey of Youth (NLSY) and the National Educational Longitudinal Study (NELS), Leahey and Guo (2001) found small but significant differences in favor of girls until age 11, at which time boys exhibited a higher acceleration rate in scores. Across all age groups, the standard deviation of boys scores exceeded that of girls. They concluded that the higher representation of boys at the extreme top of the mathematics ability scale, and the fact that boys exhibited a higher acceleration rate, may have contributed to the higher representation of males in mathematics-related careers. Ais (2002) four-year longitudinal study confirmed these findings regarding boys growth rates, but only among low-ability students. At the beginning of the study, the seventh-grade students were placed in highand low-ability groups based on previous test scores. Gender differences in mathematical growth varied according to initial ability status. Within the low group, girls started higher than boys, but boys improved at a slightly faster rate. There were no differences in initial status or growth rate between boys and girls in the high ability group. 24

PAGE 40

The methodology issue that causes the greatest concern in this study is the use of students scores from the seventh grade administration of the test as both the sole grouping variable and the outcome variable. This method of grouping may have resulted in the conflict between the findings of this study related to the high ability group and those of Leahey and Guo. Even given the differences that exist between boys and girls in mathematics, the representation of women in mathematics fields of study at the university level has been growing and is nearing parity. The percent of mathematics degrees being awarded to women has increased from 22.6 percent in 1950 to 47.2 percent in 1991 (Chipman, 1996). Chipman (1996) claims there is no other field of study which comes so close to proportional representation of men and women (p. 285). Although Leahey and Guo (2001) continue to be concerned about the underrepresentation of women in mathematics-related careers, the increase of women receiving mathematics degrees may soon bring an end to that inequity. Cross-National Studies The previous section focused strictly on studies conducted in the United States, but a review of gender differences in mathematics performance would not be complete without situating the topic in the broader context of results from cross-national studies. The Second International Mathematics Study (SIMS), under the sponsorship of the International Association for the Evaluation of Educational Achievement, was completed in 1982 with eighthand twelfth-grade students in nineteen countries. Among eighth-graders, boys performed better 25

PAGE 41

than girls in seven countries, there were no significant differences in eight countries, and in four countries girls outperformed boys (Baker & Jones, 1993). Further study of the gender stratification of opportunity in each of the countries indicated that as females gain more access to advanced training and the workplace, gender differences in mathematics performance decrease. Also, the occupational status of women in each country is a significant predictor of the size of the gender differences in test scores, with girls in countries with more opportunity for women scoring higher than girls in countries that offer less opportunity for women. Baker and Jones (1993) then compared the SIMS data to the data gathered in the First International Mathematics Study (FIMS), conducted in 1964, to see if gender differences in mathematical performance had declined over time. Nine countries participated in both studies, and in eight of those countries, the size of the superiority of boys decreased. In 1964, boys scored significantly higher than girls in eight of the nine countries. By 1982, four of these countries dropped towards parity between the sexes; in two additional countries, girls actually outperformed boys. These results indicate that across the world, gender differences in mathematics performance are decreasing. Also, because test scores were so closely tied to variations in womens access to higher education and the labor market, the data provide serious challenge to the notion that gender differences in mathematics have a biological basis. This finding corroborates the work of Walkerdine (1998) who rejects the notion that one gender is genetically predisposed to perform better in mathematics, attributing 26

PAGE 42

observed gender differences to the effects of environment and sex-role stereotyping over time. Ethington (1992) conducted a path-analysis using a sub-sample of United States students from the SIMS data to determine whether there were gender differences in the factors that impact performance. Not surprisingly, prior achievement had the greatest influence on current achievement for both males and females. The personal value placed on mathematics was the only other variable that impacted male performance. In contrast, females were influenced by family help, their perception of gender stereotyping in mathematics, and their perception of the difficulty of mathematics. The direction of the influence of family help on performance was surprising. Other things being equal, females who are more independent and receive less help from their families, who do not stereotype mathematics as a male domain, and who view mathematics as less difficult are more likely to have higher levels of achievement. Another unanticipated result was that socio-economic status, perception of parents attitudes, goals, and expectations for success did not directly impact performance for either gender. Fierros (1999) examined gender differences in mathematics achievement using data from the Third International Mathematics and Science Study (TIMSS). The data was gathered during the 1994-1995 school year at five grade levels in forty countries with over half a million students. Across countries at the eighth-grade level, there were few significant differences in mean achievement by gender, but differences that did exist tended to favor males. At the twelfth-grade 27

PAGE 43

level, males did significantly better in measures of mathematical literacy (applying mathematics to everyday problems), and advanced mathematics. In the United States sample, there were no significant differences between eighthor twelfth-grade males and females in any of the ability groupings. This contrasts with the international data in which high-ability males significantly outperformed high-ability females at both grade levels. The Program for International Student Assessment (PISA) is a new system of international assessments sponsored by the Organization for Economic Cooperation and Development aimed at understanding 15-year-olds abilities in reading literacy, mathematics literacy, and science literacy (Lemke et al., 2001). The assessment, first conducted in 2000, is scheduled to be repeated every three years. The results of the first test administration indicated that boys outperformed girls in mathematics literacy in 16 of the 31 countries that participated in the mathematics portion of the assessment. The United States was among the countries for which the differences were not statistically significant. In summary, although gender differences in mathematics achievement continue to exist in some countries, the balance is shifting away from male domination. In countries where differences remain, they are most pronounced among high-ability students (Leder, 1990). The primary cause for these achievement differences is closely tied to womens lack of access to higher education and the labor market. In countries where women have equal opportunity, such as the United States, differences in achievement that existed at 28

PAGE 44

the time of the FIMS are no longer significant. Although males and females in this country are now achieving similar scores on international assessments, gendered performance on the primary predictors of that achievement continue to differ. Gender Differences in Test Item Performance One indication that mathematical reform is taking hold is the shift away from assessments that are purely computation-based using multiple choice questions towards assessments that include open-ended problem-solving tasks situated in real-world contexts (National Assessment Governing Board, 1995; Willingham & Cole, 1997). This shift raises questions about how test item format and content may differentially impact boys and girls performance and attitudes. Different skills are required for multiple-choice and constructed response formats, and these may differ by gender. For example, some studies have shown that girls have higher verbal fluency than boys (Halpern, 1992), giving girls an advantage over boys on constructed response tasks that require writing. This advantage has proven to impact test scores on the advanced placement U. S. history test, with girls consistently outscoring boys on constructed-response items, but with no difference in scores on multiple-choice items (Breland, Danos, Kahn, Kubota, & Bonner, 1994). Item Format How this issue impacts the content area of mathematics has received insufficient attention in the literature. Unfortunately, for the few studies that have dealt with gender differences in performance related to mathematics test item 29

PAGE 45

format and content, the findings do not appear to be stable across grade levels, nor are all research results consistent. Using data from a Swedish sample of students from TIMSS, Webster and Henriksson (2000) found that middle-school aged females outperformed their male counterparts on both multiple choice and open-ended items. Although it was hypothesized that open-ended items requiring communication would provide an advantage for girls, the gap between boys and girls was actually narrower for open-ended items than for multiple-choice items. Contrary to these results, Fierros (1999), using United States data from TIMSS, found no significant gender differences for multiple-choice, short-answer, or constructed-response items for eighth-grade students, but twelfth-grade males outperformed females on the constructed-response items that are related to verbal fluency. These contradictory findings point to the need for further research. Wilson and Zhang (1998) found similar results and concluded that boys significantly outperformed girls on constructed-response items at two of four grade levels on a statewide assessment. On multiple-choice items, boys outscored girls in grades 3 and 8, but there were no significant differences in grades 5 or 10. As a follow-up to the 1998 study, Zhang, Wilson, and Manon (1999) delved deeper into the solution processes used by boys and girls in constructed-response questions. Their assessment included two extended-response tasks. The egg task challenged students to determine how many egg cartons would be required to hold the eggs that two children had colored. The jellybean task was 30

PAGE 46

more difficult, and required students to estimate the total number of jellybeans in a jar, given the number of small cups the jar of beans could fill and the count of beans in a few of the cups. They found no significant gender differences in the level of problem-solving sophistication or performance with the jellybean task, but with the egg task, more boys than girls used the most sophisticated approach, yet girls received higher scores. Pomplun and Capps (1999) compared students holistic and objective scores on questions requiring communication versus those requiring either reasoning or problem solving but not communication. At the seventhand tenth-grade levels, girls received higher holistic scores than boys on the constructed-response items requiring communication, even though boys provided more correct answers. This is due to the fact that girls created more figures and provided more mathematics reasoning to report their solution procedures, and the holistic scoring rubric credited them for this effort. In the seventh grade sample, boys received higher holistic and objective scores on reasoning questions, but girls outscored boys in problem solving. There were minimal gender differences at the tenth grade level for these items, except that boys received higher objective scores on questions requiring reasoning but not communication. Lane, Wang, and Magone (1996) studied gender differences on constructed-response items that asked students to show their solution strategies and/or explain their reasoning. They found that the two tasks that favored males included a figure in the problem situation, and the four tasks that favored females 31

PAGE 47

did not include a figure. Girls provided more complete reasoning for their answers and fewer missing responses. As with the Pomplun and Capps (1999) study, male students were at a disadvantage in the holistic scoring due to the fact that they were not as complete in showing their solution processes and providing explanations. By combining the results from Pomplun and Capps (1999), and Lane et al. (1996), it might be concluded that girls are more independent problem solvers than boys. Boys performed better when they were provided with a diagram or figure, yet girls were more likely to create their own figures. Three studies compared how boys and girls solve problems as measured against an established problem-solving heuristic. McCoy (1994) found no significant gender differences in the extent to which students employed the four steps of Polyas problem-solving process. Zambo and Follman (1994) first examined how middle school girls and boys would perform on questions presented on an algorithmic-type worksheet outlined with nine problem-solving steps. They found a small but significant difference in scores in favor of girls, and hypothesized that the algorithmic nature of the worksheet was more useful to girls than to boys. In a follow-up study, Zambo & Hess (1996) compared girls and boys performance on two forms of the test, one unformatted, and one formatted with the nine problem-solving steps. Although there were no significant gender differences on either form of the test, girls who were given the unformatted test before the formatted test improved more than boys. Conversely, girls who were given the formatted test before the unformatted test declined in performance compared to boys. 32

PAGE 48

Item Content In regards to item content, two studies examined whether gender labeling and content could explain gender differences in performance. Walsh, Hickey, and Duffy (1999) manipulated the gender labeling (female character, male character, or gender neutral) to see whether this would differentially impact the performance of middle school and university level males and females. Although overall scores were higher on male-labeled items, the labeling did not result in any significant gender differences at either age level. The only significant difference was that, at the college level, women scored lower than men when they were told the test had previously shown gender differences, but scored the same as men when told the test was merely comparing Canadian and American students. Chipman, Marshall, and Scott (1991) experienced similar results with undergraduate students. They found no gender differences on feminine and neutrally worded questions, but a small significant difference in favor of males on questions with masculine characters. Therefore, it appears that boys are more susceptible to performing differently on problems based on gendered wording, and girls are more generalized in their thinking and are not as affected by gendered wording. In light of the apparent advantage girls have on items requiring linguistic explanation, more research is needed into the role that gender differences in verbal fluency play in the holistic scoring of constructed-response items. It must also be noted that because girls and boys perform differently depending on item format, tests that contain only one type of problem may not provide a fair 33

PAGE 49

assessment of students capabilities. Because the only study including elementary aged participants (Wilson & Zhang, 1998) showed that gender differences in performance according to item format develop over time, more research is needed with young children to attempt to understand the roots of the differences that emerge in the middle school years. The limited and contradictory findings from research on girls and boys performance related to the gendered wording of problems also point to the need for further study. Gender Differences in Strategy Use Several researchers have devoted themselves to determining what, if any, gender differences exist in young childrens problem-solving strategies, and to what extent these differences may presage performance differences that ultimately emerge in middle and high school. If, in fact, significant and consistent differences emerge, they may indicate the need for a gender-differentiated curriculum. For the purposes of this discussion, strategies are defined as the plans, methods, or maneuvers the student utilizes during the problem solving process. Fennema et al. (1998) investigated gender differences in problem solving strategies used by students as they progressed from first through third grade. Their interview data revealed no significant differences in the number of correct answers for grades one and two, but in third grade, boys solved significantly more complex problems than girls. The most important finding was that there were significant gender differences in strategy use for all three years of the study. Girls tended to use more concrete strategies such as modeling with manipulatives or counting, whereas boys used more abstract strategies such as 34

PAGE 50

invented algorithms. Invented algorithms are those that the child constructs, without specific instruction, using his/her number sense and understanding of the problem situation (Van de Walle, 2001). Carr, Jessup, and Fuller (1999) observed similar results with first grade students. Boys correctly used retrieval strategies, pulling information from memory, more than girls. Girls relied more on overt strategies such as counting with manipulatives or fingers. Their further analysis of how parents and teachers may contribute to these differences revealed that both teachers and parents were more likely to direct boys to use retrieval strategies. Carr and Davis (2001) followed up on their earlier work by examining whether the differences in first-grade boys and girls strategy use was more a reflection of ability or preference. When given free choice, boys and girls were equally able to solve the problems correctly, but girls were more likely than boys to use manipulatives, and tended to use the most elementary strategies for addition and subtraction, representing all of the items being counted and counting each item separately. When the researchers controlled strategy, requiring students to use retrieval or manipulatives on specific problems, boys outperformed girls. Boys showed that they were equally able to use retrieval strategies and manipulatives, while girls were less able to use retrieval strategies. Therefore, early gender differences in generalized strategy use appear to reflect differences in both ability and preference. Gallagher and DeLisi (1994) found that gender differences in strategy use continue through high school, even in testing situations that reflect no overall 35

PAGE 51

difference in scores. Using a think-aloud protocol with the mathematics portion of the scholastic aptitude test with high-ability students, they found that females did better on conventional problems, but males did better on unconventional problems. Females relied more heavily on conventional strategies that were taught in the classroom, whereas males were more likely to use unconventional strategies. These findings demonstrate that gender differences in strategy use exhibited as early as first grade appear to remain consistent throughout high school. Different results were found in a pair of follow-up studies completed by Gallagher et al. (2000). In a small study of high-ability students, females were more successful with conventional problems than with unconventional items, yet males performance did not vary with problem type. Conventional problems are routine textbook problems that can be answered by algorithmic methods; unconventional problems require an unusual use of a familiar algorithm, estimation, or insight. With free-response items, females did better on unconventional problems compared to conventional ones, but males did better with conventional items than they did with unconventional ones. In comparing the scores of males and females, males outscored females on conventional items, but there was no significant difference in scores on unconventional items. In a larger study with students of varied ability, Gallagher et al. found no interaction between gender and item format or problem type. Additional work is needed to understand why differences found in the small sample were not replicated in the larger sample, and why performance according to problem type 36

PAGE 52

(conventional versus unconventional) was not consistent across item format (multiple-choice versus free-response). Gender Differences in Affect The impact of affect on students mathematical lives is an area of study that has drawn a great deal of research attention due to the multitude of affective differences between boys and girls in this content area, together with the heavy influence of affective factors on childrens problem-solving performance (McLeod, 1989). The most studied element of affect is student attributions: perceptions of the relationship between actions and outcomes. A study conducted in the Netherlands (Seegers & Boekaerts, 1996) found that upper-elementary girls showed a stronger tendency than boys to attribute failure to lack of ability, a feeling that was increased by a fear of making mistakes. Girls also had less favorable beliefs about their mathematical ability than boys. Boys who were highly task-oriented were less likely to attribute failure to lack of capacity, but the opposite was true for highly task-oriented girls. Boys were more confident than girls about their mathematical abilities, regardless of their actual achievement. Another article on the same study (Vermeer, Boekaerts, & Seegers, 2000) additionally reported that girls showed higher persistence than boys during applied problem solving, indicating that the previously held belief of girls learned helplessness was not supported by the sample. Stipek and Gralinskis (1991) study of thirdand eighth-grade students showed that boys attributed a good outcome to ability significantly more often than did girls, whereas girls attributed a poor outcome to their lack of ability 37

PAGE 53

significantly more often than boys. Girls were also less likely to believe that hard work would lead to success, and eighth-grade girls were more likely to hold this belief than third-grade girls. In third-grade, boys and girls had similar expectations for success, but by eighth grade, boys had become more optimistic than girls. Overall, girls rated their ability lower and expected to do less well than boys on tests. Most of these findings are corroborated by Forgasz & Leders (1996) study of seventh-grade students. They found that, compared to males, females attributed success to ability to a lesser extent, attributed success to effort to a greater extent, attributed failure to task difficulty to a greater extent, and rated their achievement levels lower. Similar results were obtained with third-grade students in that girls attributed their success to effort more than boys (Michaels, 2002), and that boys attributed their success to ability more than girls (Farrand, 2002). A longitudinal study (Tartre & Fennema, 1995) that assessed a random sample of students as they progressed through the 6 th 8 th 10 th and 12 th grades found no significant gender difference in achievement, but discovered that males stereotyped mathematics as a male domain significantly more than females; although the stereotype was not related to achievement, it increased during each year of the study. For females in the middle grades, less stereotyping was positively correlated to mathematics achievement. Further analysis regarding the possible relationship between male stereotyping and the existing high proportion of males in advanced mathematics classes is in order. 38

PAGE 54

As opposed to many other aspects of gender differences in mathematics, the research findings on affective differences between the sexes are fairly consistent. In general, girls are less likely than boys to attribute success to ability, but more likely to blame lack of success on low ability. Girls are less mathematically confident than boys, but their achievement is enhanced when they fail to stereotype mathematics as a male domain. Boys see mathematics as their domain more strongly as the years progress, regardless of their achievement. Whether this view is a causal factor in the waning numbers of girls in advanced classes, or simply a reflection of it, is yet to be seen. Summary of Gender Differences in Mathematics The research studies highlighted in this section on gender differences in mathematics are summarized in Table 1. In the United States, girls tend to do as well or better in the elementary years, after which time boys outshine girls (Ai, 2002; Hyde et al., 1990; Leahey & Guo, 2001; Park et al., 1998). Cross-national studies show a gradual decrease in gender differences over time, with males still outperforming females, especially in countries where women have less opportunity (Baker & Jones, 1993; Fierros, 1999; Lemke et al., 2001). Regarding item format and content, the gendered wording of questions did not result in gender differences in performance (Chipman et al., 1991; Walsh et al., 1999), but differences existed according to item format in most studies (Fierros, 1999; Lane et al., 1996; Pomplun & Capps, 1999; Webster & Henriksson, 2000; Wilson & Zhang, 1998; Zambo & Follman, 1994; Zambo & Hess, 1996; Zhang et al., 39

PAGE 55

Table 1 Results of Mathematical Gender Differences Studies Grouped by Topic Author/Date Participants Results Ai, 2002 Hyde et al., 1990 Leahey & Guo, 2001 Park et al., 1998 Thompson et al., 1992 Baker & Jones, 1993 Ethington, 1992 Fierros 1999 Lemke, 2001 3,116 7 th graders Meta-analysis 100 studies 3,985,682 students NLSY 4,126 children ages 4-13; NELS 9,787 children ages 14-18. 4661 4 th graders (2421 girls, 2240 boys); 2541 6 th graders (1,268 girls, 1,273 boys) 86 4 th graders (47 girls, 48 boys) 77,000+ 8 th graders in 19 countries FIMS and SIMS data 746 U.S. 8 th graders from SIMS data 8 th & 12 th graders in 34 countries TIMSS data 15 year olds in 32 countries PISA data Low-ability boys started lower than girls but improved faster. No difference in high-ability boys and girls. Slight female advantage in elementary and middle school, with a moderate male advantage in high school and beyond. Girls outscore boys until age 11, then boys growth exceeds girls. Boys standard deviation in scores is higher than girls at all ages. Girls outscored boys in computation. Boys outscored girls on problems of mathematics applications. Girls benefited more than boys in problem solving and performance from intervention training in mental computation. Performance Differences in the United States Gender differences are decreasing over time, especially in countries where women have opportunity. Prior achievement and value impact males scores. Prior achievement, family help, stereotyping, and perception of difficulty impact females scores. 12 th grade males did better on math literacy and advanced math. High-ability males outperformed high ability females. Boys outscored girls in mathematics literacy in 29 of 31 countries. The gender differences were not significant in the U. S. Performance Differences in Cross-National Studies 40

PAGE 56

Table 1 (Continued) Author/Date Participants Results Table 1 (Continued) 41 Chipman et al., 1991 Fierros, 1999 Lane et al., 1996 McCoy, 1994 Pomplun & Capps, 1999 Walsh et al., 1999 Webster & Henriksson, 2000 Wilson & Zhang, 1998 Zambo & Follman, 1994 Zambo & Hess, 1996 Zhang et al., 1999 256 undergraduates (128 females, 128 males) 8 th & 12 th graders in 34 countries TIMSS data 460 6 th and 7 th graders (250 girls, 210 boys) 90 2 nd and 3 rd graders (44 girls, 46 boys) 438 7 th and 10 th graders (219 girls, 219 boys) 63 7 th and 8 th graders (27 girls, 36 boys) and 174 undergrads (96 women, 78 men) 8,851 Swedish 6 th -8 th grade students (4,334 girls, 4,517 boys) 29,809 students in the 3 rd 5 th 8 th and 10 th grades 302 6 th and 8 th graders (149 girls, 153 boys) 155 6 th graders (77 girls, 78 boys) 300 3 rd graders (150 girls, 150 boys) Gender Differences in Test Item Performance There were no gender differences in feminine or neutrally worded questions. Men did slightly better than women on masculine worded questions. 12 th grade males scored higher than females on constructed-response items. No significant differences were found on other items at the 8 th or 12 th grade levels. Boys scored higher on constructed-response items that included a figure, girls did better on items with no figure. Girls showed their work more than boys. No significant gender differences in students use of the steps of Polyas problem solving process. Girls received higher holistic scores on items requiring communication even though boys provided more correct answers. No gender differences in questions worded with masculine, feminine or neutral characters. Men outscored women when told the test had previously found gender differences. Girls outperformed boys on multiple choice and open-ended items. On constructed response items, boys scored higher at grades 5, 8, and 10. On multiple-choice items, boys scored higher at grades 3 and 8. No significant differences for other grades. Girls performed better on a test formatted with nine problem-solving steps. Girls benefited from the formatted test, but declined when moving from a formatted to an unformatted test. On one of two constructed-response items, girls outscored boys even though boys used a more sophisticated approach.

PAGE 57

Table 1 (Continued) Author/Date Participants Results Gallagher & DeLisi, 1994). In studies focusing on attributions, the results were 42 Carr et al., 1999 Carr & Davis, 2001 Fennema et al., 1998 Gallagher & DeLisi, 1994 Gallagher et al., 2000 Farrand, 2002 Forgasz & Leder, 1996 Michaels, 2002 Seegers & Boekaerts, 1996 Stipek & Gralinski, 1991 Tarte & Fennema, 1995 92 1 st graders (46 girls, 46 boys) 84 1 st graders (42 girls, 42 boys) 82 children for 3 years 1 st through 3 rd grades (38 girls, 44 boys) 47 high-ability high school students (22 girls, 25 boys) Study 1: 28 high-ability high school students (14 girls, 14 boys). Study 2: 154 high school students (94 girls, 60 boys) 62 3 rd graders 782 7 th graders (396 girls, 386 boys) 109 3 rd graders 186 11-12 year old students (96 girls, 90 boys) 194 3 rd graders (94 girls, 100 boys); 279 8th grade (143 girls, 136 boys) 60 students (32 girls, 28 boys) data collected at grades 6, 8, 10, and 12. Boys used more retrieval strategies, girls used more overt strategies. When strategy use is controlled, boys outscore girls because they are more proficient with retrieval strategies. Few differences in performance, but significant differences in strategy use for 1 st -3 rd grades. Girls used more concrete strategies, boys used more abstract strategies. Girls used more conventional strategies and performed better than boys on conventional problems. Boys used more unconventional strategies and did better than girls on unconventional problems. Study 1: with multiple-choice, girls do better with conventional items, boys score the same on both problem types. With free-response, girls better with unconventional, boys better with conventional. Study 2: no gender interaction with problem type (conventional vs. unconventional) or item format (multiple-choice or free-response). Boys attribute success to ability more than girls. Girls more likely than boys to attribute failure to lack of ability and success to effort. Girls attribute success to effort more than boys. Boys are more confident than girls; girls are more likely to attribute failure to lack of ability. Boys attribute success to high ability. Girls attribute failure to lack of ability. Boys stereotype mathematics as a male domain significantly more than girls. Gender Differences in Affect Gender Differences in Strate gy Use

PAGE 58

1999). Although few differences in achievement exist in the early school years, young children exhibit strong gender differences in their strategy use during problem solving (Carr at al., 1999; Carr & Davis, 2001; Fennema et al., 1998). Gender differences in affect were fairly consistent across studies, with boys attributing success more to ability and exhibiting more confidence than girls. Girls, on the other hand, tend to attribute their success to effort and their failure to lack of ability (Farrand, 2002; Forgasz & Leder, 1996; Michaels, 2002; Seegers & Boekaerts, 1996; Stipek & Gralinski, 1991; Tarte & Fennema, 1995). Reading in Problem Solving The Relationship Between Reading and Mathematics Although mathematical language differentiates itself in many key areas from ordinary language, there are strong similarities between literacy and mathematical literacy. Just as the intricacy and value of mathematical language is just now being recognized, until early in the twentieth century, reading was considered a simple skill of connecting individual words to understand text. Thorndike (1917) was one of the first educational psychologists to delve deeper into the process of reading comprehension. He acknowledged the similarities between reading comprehension and mathematical problem solving: Understanding a paragraph is like solving a problem in mathematics. It consists in selecting the right elements of the situation and putting them together in the right relations, and also with the right amount of weight or influence or force for each. The mind is assailed as it were by every word 43

PAGE 59

in the paragraph. It must select, repress, soften, emphasize, correlate and organize, all under the influence of the right mental set or purpose or demand. (pp. 327-328) The language of mathematics shares many characteristics with other languages: symbols and rules are uniform and consistent, understanding increases with practice, novice learners need translations and interpretations, meaning is influenced by symbol order, and communication requires encoding and decoding (Wakefield, 2000). When we miss opportunities to teach mathematics as a language, our students may fail to notice the underlying concepts that lead to understanding (Adams, 2003). Empowering students mathematically requires helping them make the connection between the language used to communicate mathematics and their construction of mathematical knowledge (Earp & Tanner, 1980; Miller, 1993). The language of mathematics interacts with everyday language in problem solving. Rarely do story problems include the mathematical terms add, multiply, divide, or subtract, but the language used in story problems to describe the problem situation implies these mathematical terms and guides the reader in the solution process (Capps & Pickreign, 1993; Hegarty, Mayer, & Monk, 1995). This connection between problem solving and mathematical language must be kept in mind when assessing students problem-solving performance. If a word problem is solved incorrectly, there is no unambiguous way to conclude that the error is due to a lack of mathematical proficiency rather than a failure in reading comprehension (Clarkson & Williams, 1994). 44

PAGE 60

We again see the connection between reading and problem solving in various problem solving heuristics. Older heuristics (Polya, 1957; Schoenfeld, 1983) used either the word read, or the word understand in the first step, but did not focus on the connection between the two. In contrast, Krulik and Rudnicks (1996) more current problem-solving heuristic begins with the step read and think. This heuristic highlights the inseparable link between reading the problem and the initiation of critical thinking. During this stage, the reader translates the problem into her/his own language, examines and evaluates the facts of the problem, determines what question is being asked, and connects the parts of the problem to form a visualization of the situation. Challenges of Reading Mathematical Text Complexity of Word Problems Children are more likely to have language-related difficulties in the mathematics classroom than in any other content area (MacGregor, 1990). Although most students enter the mathematics classroom knowing how to read, few have developed the strategies necessary to use their reading skills to learn mathematics content (Barton, Heidema, & Jordan, 2002). Even those students who are successful readers in other content areas and have strong computational skills may struggle with reading mathematics text and working word problems (Manning, 1999; Muth, 1997). Word problems are unique in that they require the use of two language systems simultaneously; students must read while at the same time thinking abstractly about mathematics (Manning, 1999). 45

PAGE 61

Students often struggle when they are asked to read and solve word problems, even when they have no difficulty solving the problem when it is translated for them (McIntosh, 1997; Schell, 1982). Reading the words in problems requires much more than just decoding: students must analyze the information in the problem; translate and use symbols; identify necessary, insufficient, or extraneous information in the text; and employ a strategy that will aid them in solving the problem (Schell, 1982). These challenges are confirmed by the fact that children perform ten to thirty percent worse on arithmetic word problems than on comparable problems presented in numerical format (Kiplinger, Haug, & Abedi, 2000). Another difficulty stems from the fact that reading and mathematics skills do not develop at the same rate in young children. For problems presented in numerical format with the operation signs included, children who understand the algorithm can go right to work. When the same problem is embedded in words and sentences, students must first comprehend the language of the text before they can employ an appropriate algorithm (Choi & Hannafin, 1997; Fuentes, 1998). Comparison to Other Forms of Text Learning to read story problems is a much different task than learning to read narrative stories or expository text. Reading mathematical text requires high-level thinking and comprehension skills beyond much of what is necessary for reading text in other content areas (Reehm & Long, 1996). The language of expository text is usually straightforward and serves to explain or elaborate a 46

PAGE 62

point. The readers goal is to sift and filter the information to determine the most essential elements. In contrast, the language of story problems often hides, implies or assumes the mathematical concepts (Fuentes, 1998). The readers goal must be to expand the compact text and unpack meaning. Every word and abstract symbol must be read and fully understood, there are rarely context clues to point to the meaning of unfamiliar words (Fuentes, 1998; Hunsader 2002a), and many of the vocabulary words are either unique to mathematics or have different meanings than the same words in everyday usage (Fuentes, 1998). Story problems may be much more similar to narrative text than to expository text in that they tell a story. However, very little is known about how the narrative text structure functions in the problem solving process. Research into problem wording (Bernardo, 1999; Fan et al., 1994; Leong & Jerred, 2001) makes clear the existence of a relationship between text structure and childrens ability to solve problems, but the nature of the relationship is relatively unexplored territory. Reading story problems also requires comprehension strategies that may not be used in other content areas. Comprehending mathematical text requires an understanding of not just the words, but also the signs, symbols, and graphics. Mathematical information is often presented in unfamiliar ways, not only right to left, but also left to right (number lines and long division), top to bottom (tables and fractions), bottom to top (vertical multiplication) and even diagonally (graphs) (Adams, 2003; Barton, Heidema, & Jordan, 2002). In addition, whereas the authors main idea in a language arts passage is often the 47

PAGE 63

first sentence, the main idea of a story problem is often not presented until the end. Readers have to sift through all of the problem details before knowing the point of the problem (Barton et al.). For these reasons, mathematics is widely acknowledged as the most difficult content area reading material, with more concepts per word, per sentence, and per paragraph than any other area (Schell, 1982, p. 544). Yet teachers of mathematics often report that they are the least prepared to teach students how to read to learn. Im a math (science) teacher, I wasnt trained to teach reading (Barton et al., p. 24). Just as with reading other forms of text, comprehension of mathematical story problems is aided when students can create visual pictures in their heads about the problem situation, interact with the problem by calling on prior knowledge and experience, and bring meaning to the text instead of expecting meaning to be inherent in the words (Tovani, 2000). Vocabulary Mathematical vocabulary also complicates the ease of reading mathematical text (Jones, 2001; Thompson & Rubenstein, 2000). Some mathematical terms are shared by everyday English and may or may not have the same meaning (Thompson & Rubenstein). Mathematical terms such as quotient, divisor, and rhombus name concepts that have no unique real-world representation (Miller, 1993). Because these terms are found only in mathematics, students have no opportunity to acquire or practice the use of the terms outside the mathematics classroom (Capps & Pickreign, 1993; Thompson & Rubenstein; Usiskin, 1996). Some words have more than one mathematical 48

PAGE 64

meaning, and yet others such as volume, yard, count, product, and range, have different meanings in mathematics than in everyday language (Thompson & Rubenstein; Wardrop, 1993). Still other mathematics words sound like English words but are homophones and have vastly different meanings than their counterparts. Examples of this include plane vs. plain, one vs. won, sum vs. some, and whole vs. hole (Adams, 2003; Thompson & Rubenstein). Educators must remember that mathematical understanding and vocabulary learning are interdependent, and that instruction that does not address vocabulary is incomplete (Thompson & Rubenstein). Symbols Mathematical vocabulary and mathematical symbols go hand-in-hand. To understand mathematical vocabulary, students must learn to recognize symbols, connect the vocabulary word that names each symbol, remember that the word and the symbol have the same meaning and are pronounced the same, and finally, understand the concept behind the word and the symbol (Reehm & Long, 1996). It is the relationship between these elements that allows students to make sense of the problem context and translate the text of the problem into the symbols used to solve the problem (Adams, 2003). Symbols embedded in mathematical text communicate meaning that must be interpreted during the reading process (Adams, 2003). This is no small task because mathematical symbols such as + x < and % are like a foreign language to children. As an added complication, there is no phoneme-grapheme relationship to aid in decoding because, linguistically, the symbols are 49

PAGE 65

ideographs (Reehm & Long, 1996). The context of mathematical language and symbols can also have a great impact on comprehension and understanding. For example, the numeral is interpreted to have a very different meaning in each of the following contexts: 2, 24, 21 10 2 32 (Capps & Pickreign, 1993). Therefore, the meaning of each symbol is largely determined by its context, and it must be instantiated (Anderson et al., 1976) as a more specific exemplar of the general concept of the numeral. If students are unable to decenter on a fixed meaning for a symbol, understanding will be negatively impacted. Other Challenges Not only do the differences in mathematical text together with vocabulary and symbols make for difficulties in comprehending mathematical problems, but also there are numerous other characteristics that further complicate childrens reading process. To begin, mathematics text is presented in a variety of formats: pictures, graphs, symbols, and words (Reehm & Long, 1996). The wording of a word problem also relates to its difficulty. Some problems are not worded consistently or explicitly (Mosenthal & Kirsch, 1993), so that the mathematical nature of the problem may not be readily apparent (Adams, 2003). Given a picture with five birds and three worms, the question How many more birds are there than worms? is much more difficult than the question How many birds will not get a worm? The difficulty is obviously not in decoding, because the difficulty of the individual words in both questions is similar, but the linguistic complexity of the first question results in many more students giving incorrect responses (Reed, 1999). Comprehension of the semantic information 50

PAGE 66

contained in the problems is informed by childrens existing schemata that represents their knowledge of the story problem discourse, and the topic and context of the problems (Van Dijk & Kintsch, 1977). Comprehension of mathematical story problems, just as with expository or narrative text, involves an interaction between the reader, the text, the author, and the context (Tierney & Mosenthal, 1985). Therefore, if the way a story problem is worded elicits recall of a similarly worded problem or a previous real-world experience, the child will more likely comprehend the problem. The question stating, How many birds will not get a worm? may be more easily understood than How many more birds are there than worms? because it resonates with young children and their real-world experiences with sharing and being included. The second question requires students to make a comparison judgment of more than, a situation that is less common in the early childhood experience. The differences in the wording of the two questions also signal different means of finding the answer. The question beginning How many signals the child to form a one-to-one correspondence between birds and worms, then count only the remaining two birds. The question beginning How many more eventually signals the need for a comparison between the numbers five and three, but the question cannot be fully understood until the end of the sentence where the child encounters the words than birds. Although from an adults perspective the two problems involve the same mathematical computation, from the childs perspective, the linguistic presentation of the two questions creates two distinctly different problems that are solved in different ways. Because one51

PAGE 67

to-one correspondence and counting are early number concepts that can be understood at an earlier developmental level than the comparison involved in finding how many more than (Van de Walle, 2001), the question asking how many is more easily answered. The ability to read word problems to determine what information is necessary, extraneous, or missing is a skill that requires instruction, guidance, and practice (Adams, 2003). The density and complexity of mathematical text (Wardrop, 1993) makes the process of reading and constructing an appropriate mental model (Hegarty, Mayer, & Monk, 1995) as much an issue in problem solving as performing the mathematical operations that are described in the reading (Mosenthal & Kirsch, 1992). In order for students to become independent learners, the structure of mathematical language must be explicitly taught, just as the structure of narrative and expository text is explicitly taught (Reehm & Long, 1996). Readability No discussion of the complexity of reading in mathematics would be complete without a mention of readability. Although the amount of research and writing on the subject of mathematics readability was abundant from the late 1940s through 1980, the lack of it since that time, especially in the last ten years, is disturbing. With the current trends in mathematics instruction that emphasize solving problems in context, together with research that solidifies the connection between reading comprehension and mathematical problem solving, it would 52

PAGE 68

stand to reason that there is now an even more compelling rationale for research into readability, especially as it relates to mathematics, than ever before. Both mathematical and non-mathematical vocabulary, the number of words in the problem, the length of individual sentences, the length of the problem statement, and the grammatical structure of the problem all impact readability and difficulty (Clarkson & Williams, 1994; Fan, Mueller, & Marini, 1994; Kiplinger, Haug, & Abedi, 2000). Most readability formulas (Dale & Chall, 1949; Flesch, 1948; and Fry, 1969 ) rely on a limited number of factors and are only designed for use with running text of at least 100 words. When using these formulas, the rating process excludes analysis of mathematical text presented through mathematical symbols rather than English, and fails to acknowledge the complexity of reading text that is densely packed with mathematical jargon (Kane, 1970). Two formulas that were designed for short passages such as those used in mathematical word problems are the Short Passage Readability Formula (Fry, 1990), which is simply an adaptation of the Fry Readability Graph, and the more comprehensive Homan-Hewitt Readability Formula (Homan, Hewitt, & Linder, 1994) that uses a combination of words per sentence, clauses per sentence, word length, and word familiarity (Helwig, Rozek-Tedesco, Tindal, Heath, & Almond, 1999). Although these formulas give a rough estimate of a problems readability level, they do not account for all of the complexities associated with mathematical text described earlier. The fault does not lie with the readability 53

PAGE 69

formulas themselves, but with the application of the formulas to contexts for which they were not intended. Hater and Kane (1975) worked to develop a formula that would be easy to apply, yet be appropriate for assessing the readability of mathematical English. They adapted the cloze procedure by replacing every fifth word or symbol in the text, tables, and figures with a blank, and tested the method using five passages with over 1,700 middle and high school students. They found it to be a reliable means of assessing the reading difficulty of mathematical English. The challenge with this method lies in its lack of practicality. Unlike its predecessors, it is not a formula that can be applied to a passage to determine a grade-specific level of difficulty. To assess the readability of any passage requires administering a cloze test to a large number of participants and analyzing the results, a daunting task for most applications. Reading Comprehension and Word Problems Traditional problem solving focused simply on finding the correct answer. With the newer focus on the use of real-world problem contexts, problem solving now requires that students comprehend text well enough to interpret meanings embedded in the context (Basurto, 1999). Unfortunately, students often embark on problem-solving expeditions without realizing the importance of reading for comprehension (Bratina & Lipkin, 2003; Kiplinger, Haug, & Abedi, 2000). Current notions of reading involve much more than just decoding text. The focus is now on how the reader creates meaning as a result of the transaction between the text and the reader (Rosenblatt, 1978; Ruddell, 1997). 54

PAGE 70

The meaning the reader creates is dependent on his/her prior knowledge (Draper, 2002; Fuentes, 1998; Rosenblatt, 1978) and experience with the information presented in the text, together with the readers ability to make sense of the signs and symbols inherent in the text. Students engaged in problem solving activities may need assistance with reading the text if they lack mathematical content knowledge or an understanding of how to interpret, use, or manipulate mathematical signs and symbols. As mathematics teachers are the most knowledgeable about reading and creating mathematical text, they are in the best position to help students engage in this kind of literacy (Draper, 2002). Simply being able to read the words in a problem does not yield comprehension if the reader does not understand the unique meanings implied by the mathematical context, language, or symbolism (Capps & Pickreign, 1993; Pau, 1995). Reading is composed of the interaction of two distinct processes: decoding and comprehension. Reading comprehension requires the reader to reach far beyond the literal meaning of the printed words. It includes such activities as determining the main idea; inferring the writers purpose; drawing inferences; and summarizing, integrating, and synthesizing the information (Sovik, Frostrad, & Heggberget, 1999). As with any type of text, students will have greater comprehension of mathematical text if they have a set purpose for reading, make predictions, monitor their comprehension, and summarize the information contained in the reading in their own words (Draper, 2002). For skilled readers, decoding is nearly automatic, allowing the majority of the readers attention to be focused on comprehension. In contrast to pure 55

PAGE 71

computation problems, students inability to comprehend the entire problem and context of word problems may lead to a significant number of errors. Arithmetical errors in word problems may indicate a lack of understanding of the text rather than faulty arithmetic. Poor reading comprehension can be related to the childs inability to draw inferences from the text, failure to use working memory to integrate new information into existing schema, or failure to monitor ones own thinking and refer back to the text to gain understanding (Sovik, Frostrad, & Heggberget, 1999). Problem solving assessments that require students to rely on their reading skills to demonstrate their mathematical competency may create a serious disadvantage for low-ability readers who struggle with comprehension (Helwig, Rozek-Tedesco, Tindal, Heath, & Almond, 1999). Students who utilize direct translation as a problem-solving strategy focus on the key facts and key words in each statement as they read. Direct translation relies on identifying the key numbers and relational terms in the text of a mathematical story problem, and using them to calculate a solution. All other information is ignored except these details. This method works well for problems that are consistently worded with all numbers listed in the order they must be used for computation, and all relational words correctly implying the operation to be used. However, when problems are inconsistently worded, direct translation will lead to inaccuracies (Bernardo, 1999; Hegarty, Mayer, & Monk, 1995; Reed, 1999). An example of an inconsistently worded problem is: Matt has three more dollars than Nancy. Matt has six dollars. How many dollars does Nancy have? With direct translation, the student would focus on the numbers three and six, 56

PAGE 72

and the relational word more, incorrectly concluding that the solution requires adding three and six to get nine. Students who develop a holistic view of the problem using a problem-model approach will comprehend the relationship implied in the problem and understand that because Nancy actually has less money than Matt, the problem requires subtraction (Hegarty et al.). Highlighted Research on Reading and Problem Solving The research that relates reading to problem solving falls into four main categories: problem wording, reading disabilities, accommodations, and comprehension. Problem Wording Research The wording of a mathematical story problem appears to impact the difficulty of the problem. Fan, Mueller, & Marini (1994) used three different wordings of problems requiring students to find the difference between two disjoint sets and found that the problems worded in a format requiring a static comparison between two disjoint sets (compare) were significantly more difficult than those requiring the student to make two disjoint sets equal (equalize) or to find element correspondence between two disjoint sets (wont get). An example of a compare problem is, John has 9 apples. Ann has 4 apples. How many more apples does John have than Ann? (p. 359). An equalize problem might ask, Fred has 9 buckets. Betty has 5 buckets. How many more buckets does Betty have to get to have as many buckets as Fred? (p. 360). An example of a wont get problem is, children went to the store to buy hats. There were only 5 hats in the store. How many children would not get a hat? (p. 360). Because 57

PAGE 73

the only difference among the three types of problems was problem text, it can be concluded that problem wording impacts the difficulty of problems. One element of the methodology that seriously reduces the generalizability of these results is that the problems were read aloud to the students. Although this method did allow the researchers to ensure that decoding would not confound the results, it also means that the results cannot be used to draw any conclusions about how problem wording impacts problem-solving in an environment where students must read for themselves. Hegarty, Mayer, & Monk (1995) compared the reading processes of those students who make errors in solving inconsistently worded problems with those who do not make errors. By definition, an inconsistently worded problem is one for which using the numbers in the order they are presented and applying an operation based simply on a key word such as more will not result in a correct solution. By monitoring students eye fixations during the problem solving process, they were able to establish a correlation between the amount of attention paid to specific elements of the problem and problem-solving performance. They found that unsuccessful problem solvers fixated on numbers and relational terms significantly more than successful problem solvers, confirming their hypothesis that poor problem solvers rely on direct-translation of text to solve problems. Successful problem solvers spent less time reexamining the problem, and when they did look back, it was primarily to examine the context rather than numerical details. This implies that the successful problem solvers were attempting to develop a model of the problem to aid in the solution process. 58

PAGE 74

Although this study successfully draws conclusions about the behaviors of the adult problem solvers who participated, no evidence is provided to support the conclusions that the same behaviors govern childrens problem solving experiences. Further work is needed to determine whether childrens problem solving performance would differ based on whether they used direct translation or problem-model strategies. Leong and Jerreds (2001) work serves to fill the gap revealed in Hegarty et al.s (1995) work regarding consistency of language, and also extends into how the adequacy of information in mathematical problems impacts students proficiency in problem solving. In agreement with Hegarty et al., they found that students scored significantly better on the consistently worded questions than on inconsistently worded ones. There were also significant main effects for ability level and adequacy of information, with students performing better on questions with just enough information than on those with extraneous information. Because the study was conducted with elementary-aged students who independently read the questions, the results have much greater applicability to the classroom than those of Hegarty et al. They also serve to highlight the critical role played by language and reading comprehension in mathematical problem solving because variations in wording consistently impacted students performance. Taking a different approach to the relationship between reading and problem solving, Bernardo (1999) studied the effect on performance of re-wording word problems to make them more explicit. Although his study involved 59

PAGE 75

second to fourth grade students whose primary language was Filipino, the results have important implications for English speakers also. The subjects were given a test containing similar problems presented in four formats: conventional set in English, conventional set in Filipino, re-worded set in English, and re-worded set in Filipino. There was a statistically significant main effect for problem wording, with students doing much better on the re-worded problems, as well as a statistically significant effect for language, favoring the problems presented in Filipino. The interaction effect was also statistically significant, indicating that for problems written in English, re-wording did not provide much benefit, but for problems written in Filipino, rewording resulted in higher achievement. The results indicate that poor performance in mathematical problem solving may be more a reflection of students reading comprehension ability than mathematical ability. The implication is that students may be more successful with word problem solving and better able to demonstrate their competency when the complexity of the text does not exceed their reading comprehension ability. Disabilities Research Mathematics disabilities have historically garnered much less attention in the research than reading disabilities (Ackerman & Dykman, 1995; Fuchs & Fuchs, 2002; Jordan & Hanich, 2000), but several studies conducted in the last ten years are shedding light on the relationship between the two. Jordan and Montani (1997) examined the computation and problem-solving skills of two subgroups of students with mathematics disabilities: those with reading difficulties and those without. On both number facts and story problems, 60

PAGE 76

students with mathematics disabilities who were good readers scored lower than normally achieving students in timed conditions, but not in un-timed conditions. Children with disabilities in both mathematics and reading performed worse than normally achieving students in all tasks: timed and un-timed. As a follow-up to the work by Jordan and Montani (1997), Jordan and Hanich (2000) studied the performance of second-grade children on cognitive tasks associated with mathematics teaching. All children were assessed individually on tasks including number facts, story problems, place value, and written calculations. The results for the story problem tasks are of interest here. The story problems were read aloud to the students and a written version was also provided. The students with mathematics and reading disabilities and those with only mathematics disabilities performed significantly worse than both normally achieving and reading disability only students. Those with only mathematics disabilities performed better than the group with both disabilities. The only task in the study that differentiated between the mathematics disability only group and the normally achieving group was the story problem section of the test. The results of this study are interesting, but several elements of the study design must be considered before drawing any conclusions. By reading the problems aloud, the full impact of reading disabilities on children who must independently read and solve mathematical problems cannot be surmised. The study sample was relatively small, and the researchers did not attempt to gather information about the teaching practices in the three classrooms from which the 61

PAGE 77

study participants were drawn. If, in fact, the practices were vastly different, using the individual child as the unit of analysis violates the assumption of independence, causing a serious threat to the robustness of the results. What this study does accomplish is to highlight the necessity of differentiating between students with mathematics disabilities from those who also have reading disabilities. Much of the earlier research on this topic confounded these two subgroups, making it difficult to interpret the findings. Jordan, Kaplan, & Hanich (2002) built on this work with a longitudinal study of students through their secondand third-grade years, geared towards understanding the long-term developmental issues of students with mathematics disabilities, with and without co-morbid reading disabilities. Based on their previous work, the researchers predicted that students with both mathematics and reading disabilities would gain mathematical proficiency at a slower rate than children with only mathematics disabilities due to their inability to use reading as a compensatory strategy. Students were assessed using the Woodcock-Johnson Psycho-Educational Battery-Revised test at the outset of the study to assign the children into one of four groups: mathematics disability only, mathematics and reading disability, reading disability only, or normally achieving. The mathematics tasks used in assessment throughout the study were read aloud to the students. Regardless of instructional method, children with both disabilities appeared to learn at a slower rate than those with only a mathematics disability. The most interesting finding is that students who began the study with only reading disabilities completed the study with mathematical performance 62

PAGE 78

levels equal to those students who started the study with mathematics disabilities. The researchers concluded that children with specific reading disabilities are at risk of developing associated mathematics disabilities as they progress through elementary school, believing that some mathematics skills appear to be acquired in a language-specific format. As with Jordan and Hanichs (2000) earlier work, the findings must be interpreted in light of the fact that the students were not required to read the problems themselves. The findings of Ackerman and Dykmans (1995) research are similar to those of Jordan et al. (2002). Ackerman and Dykman studied elementary-aged poor readers from ages seven to twelve who were coded as either reading disabled (RD) or reading and arithmetic disabled (RAD) according to test scores. All students were then given a battery of psychoeducational tests. The results indicated that the RAD students scored lower on measures of sequential memory than RD students, and both groups scored lower than adequate readers. These differences in achievement between RD and RAD students were significantly larger in the upper-elementary students, suggesting that mathematical retardation among poor readers increases with age. Fuchs and Fuchs (2002) saw a weakness in the existing research about the relationship between mathematics disabilities and reading disabilities in problem solving. Previous research was largely confined to a study of student responses to simple, one-step arithmetic word problems. To address this limitation, they conducted a study of students with mathematical disabilities with and without co-morbid reading disabilities on a range of mathematical problem63

PAGE 79

solving tasks including simple arithmetic story problems (one-step story problems involving sums or minuends of 9 or less), complex story problems (word problems requiring students to select a problem-solving strategy, find relevant information, and use a correct operation), and real-world story problems (presented in a multi-paragraph narrative including graphic and tabular information and requiring students to answer four questions about the problem situation). Students in this study were required to read the problems on their own. An analysis of the three problem types revealed that real-world problems had significantly more words, sentences, words per sentence, verbs, numbers, and math steps, than complex story problems, with arithmetic problems having the least number of all of these criteria. The accuracy of all students responses decreased dramatically across the three problem types, with the highest scores found on arithmetic problems, followed by complex problems then real-world problems. The students with mathematics disabilities scored 75% on the arithmetic story problems, 14% on the complex story problems, and 12% for real-world story problems, while the students with both mathematics and reading disabilities scored 55%, 8%, and 5%, respectively. In all cases, the students with reading disabilities scored significantly lower than those without, indicating that reading ability is a significant predictor of mathematical problem solving ability across problem types. In a study of deaf children 8 to 12 years of age, Pau (1995) found that the reading level of the word problem was directly related to childrens problem solving abilities. Kelly & Mousley (2001) studied both deaf and hearing college 64

PAGE 80

students to determine the effect of their measured reading levels on their ability to solve mathematics problems that were presented both in numeric and word problem formats. No significant differences were found with the simple problems, but with the more challenging word problems, the deaf students of all reading ability levels performed significantly lower than the hearing students, and the deaf students with high reading ability performed significantly better than the deaf students with middle and low reading ability. Accommodations Research The issue of providing testing accommodations to students with learning disabilities is a matter of serious debate due to the high-stakes nature of many current assessments. The goal of any accommodation is to level the playing field by ensuring that students disabilities in one area do not impact their ability to demonstrate achievement in another area. One of the most contentious accommodations currently being debated is reading the mathematics portion of the test aloud to students with reading disabilities. Tindal, Heath, Hollenbeck, Almond, and Harniss (1998) studied the impact of this accommodation on both regular and special education students. The control group used the standard administration of independently reading the items from the booklet, but the questions were read aloud two times for the students in the accommodation treatment. Students in the regular education curriculum showed no preference for test format, but students who were receiving special education services scored significantly higher with the read-aloud accommodation. These findings seem to support the use of this accommodation for mathematics test situations. 65

PAGE 81

Johnson (2000) studied the same phenomenon and found that the accommodation did not impact the performance of students without disabilities, but approached significance for students with reading disabilities. The small group sizes (<40) limited the power of the statistical test to detect a difference; therefore the results must be interpreted with caution. However, the consistency of the findings with the results of the Tindal et al. (1998) study indicate that reading the mathematics questions aloud to students with reading disabilities does not affect the validity of the test. Fuchs, Fuchs, Eaton, Hamlett, and Karns (2000) work extended previous findings by focusing on how accommodations impact performance on different problem types. In their study, students with and without learning disabilities performed best on concept-applications problems when the questions were presented orally. With the problem solving test, however, learning disabled students performed significantly better under the oral presentation, but the presentation format had no effect on non-disabled students. The concept-applications problems contained significantly fewer words than the problem solving items, suggesting that reading was an important access skill only for the latter. For this reason, students performance on the concept-applications problems is counter-intuitive and deserves further study. Helwig et al.s (1999) findings contradict those of Fuchs et al. (2000). Helwig et al. found that mathematical problems that contained a multitude of challenging language factors tended to be solved more easily when read aloud as opposed to having students read the problems. The accommodation did not 66

PAGE 82

have a significant impact on students performance with word problems containing only a few words. Within the high mathematics group, the differences between hearing the problem read aloud and independent reading were much greater for the low-ability readers than for the proficient readers. The low mathematics students significantly preferred having the items read aloud, but their lack of mathematical proficiency apparently precluded them from benefiting from the accommodation. The researchers concluded that because the results of the accommodation among strong math students discriminated between those with high and low oral reading fluency, part of what was being tested was students word identification skills. Although reading word problems aloud was not found to be beneficial for a majority of students, this accommodation may allow those who excel in mathematics yet struggle with reading fluency to demonstrate their true mathematical abilities. In a follow-up study, Helwig, Rozek-Tedesco, & Tindal (2002) sought to extend their earlier work by administering both standard and read-aloud versions of a mathematics achievement test to a large sample of students over a range of grade levels. They hypothesized that the students with low reading skill would perform better when test items were read aloud, especially for items that were linguistically complex. The main effect of reading ability was significant in all cases, however the results showed only limited evidence, and only at the elementary level, that reading the test items aloud was an effective accommodation. The researchers, surprised by these results, point to the possibility that students may have failed to take advantage of the read-aloud 67

PAGE 83

accommodation, choosing to work on their own. No observations were conducted during the testing event to ensure that students actually followed the pace of the oral reading. In addition, even the most linguistically complex test items on this assessment were not as difficult as the items used in other studies that found the accommodation to make a significant difference for poor readers. It should also be noted that the low readers scored extremely low on the test, indicating that they may not have had the mathematics skills necessary to take advantage of the accommodation. Due to the conflicting results of some of these studies, and considering the high-stakes nature of any policies that may be enacted about testing accommodations, further research is still needed. Future studies must work to solidify the connection between problem types and the resulting impact of accommodations for both learning disabled and average achieving students. In addition, sample sizes must be large enough to ensure that statistical tests are able to detect any differences that may exist. Reading Comprehension Research An increasing amount of practice-based literature points to the relationship between reading comprehension and problem solving performance, yet the relationship is not reinforced in the classroom enough (Tovani, 2000), and the empirical research on this topic remains scant. Borasi, Siegel, and Fonzi (1998) found that encouraging students to talk, write, draw, and enact texts provided them with concrete ways of constructing meaning from the mathematical text that they read. They were correct in hypothesizing that the instructional strategies 68

PAGE 84

drawn from the reading education literature, specifically transactional reading theory (Rosenblatt, 1978), would allow mathematics teachers to support sensemaking and discussion in their classrooms. Although this study focused on the reading of mathematical narrative texts rather than mathematical problems, the researchers suggested that the results were so promising that further research to explore the applicability of these reading strategies to problem solving is warranted. Morales (1998) examined whether childrens understanding of the semantic structure of word problems would impact their ability to solve the problems accurately. Using a four-way ANOVA, the study found that students who have a faulty conception of what the problem is about have more difficulty solving the problem. As a result, the researcher concluded that there is a need for placing greater instructional emphasis on the comprehension component of problem solving. The final study to examine comprehension looked at the impact of fourth-grade students reading comprehension ability on their choice of strategies in solving arithmetic word problems (Sovik, Frostad, & Heggberget, 1999). Students with good arithmetic ability, regardless of reading ability, chose similar strategies for solving the addition and subtraction problems. For the multiplication and division problems, the group with more proficient reading comprehension and high arithmetic ability used significantly more sophisticated strategies than the proficient readers with poor arithmetic ability. With the more mathematically difficult problems, proficient readers seemed to profit more from 69

PAGE 85

their reading ability than with items that tested simpler arithmetic concepts. Although this study does show some relationship between reading ability and strategy selection, no significant conclusions can be drawn from the results due to the fact that no effort was made to include the readability level of the question stems as a variable, or to provide for a range of reading difficulty within each test set. The addition and subtraction items were significantly below grade level in both reading and arithmetic difficulty. A typical example is Jens had 2 apples and Ola had 7 apples. How many apples did they have in all? (p. 380). Because of this, it is not surprising that nearly all students in all groups answered all of the questions correctly, and the results showed no significant difference between groups. Gender Differences in Reading/Literacy Because reading ability directly impacts childrens achievement in problem solving, a review and discussion of gender differences in reading is in order. Recent statistics on boys performance in literacy show that boys account for 75 to 85 percent of those labeled at-risk of poor achievement progress in literacy (Rowe, 2000), and that girls consistently outperform boys in reading and writing (National Center for Education Statistics, 2000). Other studies show that boys are three to five times more likely than girls to be labeled reading/learning disabled (Young & Brozo, 2001), with a boy to girl ratio of 4:1 in learning disabled programs (Vogel, 1990), and a significantly greater number of boys than girls in remedial reading classes (Alloway & Gilbert, 1997). These same findings are being found across the globe, including Canada (Gambell & Hunter, 2000), the 70

PAGE 86

United Kingdom (Barrs, 2000), and Australia (Rowe, 2000). In fact, the Program for International Student Assessment (PISA) found that girls scored significantly higher than boys on a test of reading literacy for all 32 of the countries that participated in the study (Lemke et al., 2001). Is this a new phenomenon? Research into the history of American education reported by Young and Brozo (2001) shows that as early as the late 1800s there was concern that boys were not doing as well as girls in school. Throughout history, a pattern emerges in which boys failures were blamed on factors external to the boys themselves: pedagogy, methods, texts, or teachers. When boys were successful, all credit was given to their intrinsic intelligence and ability. In contrast, girls successes have been attributed to pedagogy, methods, texts, or teachers, while their failures were blamed on the girls themselves and their lack of ability. The concern over boys academic performance resurfaced periodically until the 1960s when, for the first time, the focus shifted to girls. The result has been that girls are now consistently outperforming boys in all measures of academic success (Young & Brozo, 2001). Sources of Boys Underachievement There are many roots of underachievement in boys, the first of which is social in nature. Boys have a strong desire for membership in a Discourse of masculinity (Discourse with a capital D meaning ways of being, thinking, acting, talking, and reading; Gee, 1996). This Discourse comes into play in the literacy classroom when boys, wishing to be viewed by their peers as a certain sort of boy, will read, write, and think like other boys who are members of that particular 71

PAGE 87

Discourse of masculinity (Young & Brozo, 2001). Unfortunately, the most prevalent masculine Discourse among school-aged boys includes a caveat that real boys dont do schoolwork, including literacy work. Jackson (1998) found that even academically successful boys felt the need to hide their commitment to academic work in order to be accepted by their peers. Boys come to school with a fairly fixed preference for certain kinds of literature (Young & Brozo, 2001), and denying them engagement with texts that appeal to them denies them access to meaningful literacy. When boys are not engaged with texts that appeal to them, we risk them developing the idea that reading is not a masculine undertaking (Hunsader, 2002b). The second root of the problem is biological. According to Gur (as cited in Mulrine, 2001), director of the University of Pennsylvanias Brain Behavior Laboratory, womens brains are an average of 11 percent smaller than mens, yet there is no gender difference in IQ. The female brain is simply more finely developed with a larger corpus callosum, the bundle of nerves that provides communication between the two sides of the brain. The result is that females verbal skills are superior to those of males. Boys have more white matter in the brain, making for superior gross motor skills, and their high volume of spinal fluid allows their brains to sustain more blows without injury. In addition, the portion of the brain that controls emotions is less developed in men than in women (Mulrine, 2001). The very things at which the male brain excels gross motor skills, visual and spatial skills, exuberance are often things that do not find a good reception in educational settings. 72

PAGE 88

This lack of reinforcement of boys natural strengths has resulted in their perception that teachers like girls better (Dobson, 2000). Elementary classrooms, especially in the primary grades, use management techniques that are designed primarily by women to fit the temperament, learning styles, and natural tendencies of girls, and this is taking a toll on boys (Dobson, 2000; King, in press, Palardy, 1998). Gender is overtly and covertly integrated into school culture, primarily through the gender of the teacher. In the early grades, boys and girls are expected to master literacy while embedded in a classroom environment that is teacher-centered (Boggs, 1999). Due to the overwhelming preponderance of female teachers in the early grades, this teacher-centered classroom environment is more accurately termed a female-centered environment. The third source of difficulty for boys in the literacy classroom is a result of the increased demand for operational literacy in schools especially verbal reasoning and written communication skills as evidenced by changes in both school-based and standardized assessment. These are all areas in which girls have a distinct advantage (Rowe, 2000). For many boys, especially those termed late bloomers, the verbal reasoning and literacy demands being placed on them are often constrained by performance parameters that limit their ability to demonstrate what they know and can do. Rowe (2000) reports that boys reluctance to read, exacerbated by their recent increase in solitary play (video games), contributes to their literacy underachievement. In addition, the fact that girls social lives revolve around verbal communication, while boys social 73

PAGE 89

experiences revolve around play, serves to widen the gap between the sexes in school-based literacy activities (Rowe, 2000). Newkirk (2002) acknowledges that boys currently enjoy a low status in the educational arena, but refuses to blame popular culture for their plight. Instead, he posits that because boys are avid consumers of visual media and TV is the primary entertainment for economically disadvantaged children, teachers have an ethical obligation to use this media creatively to enhance boys literacy development. The fourth cause of underachievement in literacy is a bit of a catch-22. Research indicates that the most significant predictor of student literacy achievement is student attentiveness. The most influential indicator of student attentiveness is gender, with girls being significantly more attentive than boys (Hill & Rowe, 1998; Rowe, 2000). The implication of this finding is enormous given the current move away from short answer assessments and activities to those that require extended periods of attentiveness. This change in pedagogy may be inadvertently exacerbating the differences in literacy achievement between boys and girls (Hunsader, 2002b). Interestingly, the relationship between attentiveness and literacy achievement holds true in reverse as well. Research shows that students early growth in reading skills has the result of reducing inattentive behavior, thereby improving academic achievement. The underachievement of boys leads to boys acting-out behaviors, inattentiveness, low self-esteem, and an unwillingness to participate in schooling. Conversely, it is literacy achievement that most strongly reduces inattentiveness and improves 74

PAGE 90

both the educational and behavioral outcomes of students especially boys (Rowe, 2000). Highlighted Research on Gender Differences in Reading/Literacy A significant portion of the empirical research relating gender and reading comes out of Canada. In a large-scale study of students in Saskatchewan, male students were weaker than females in reading across all grade levels studied. Attitudinal data indicated that girls had higher self-concepts as readers and more positive attitudes towards reading than boys (Gambell & Hunter, 1999). A longitudinal study conducted in Eastern Canada tracked the reading achievement of a group of students as they progressed from first through sixth grade (Phillips, Norris, Osmond, & Maynard, 2002). In the first through third grade, there was a higher proportion of boys in the below-average category, and a higher proportion of girls in both the average and above-average categories. An interesting shift occurred by the end of fourth grade, however, with the relationship between gender and reading ability losing statistical significance. A similar phenomenon was found In British Columbia with a study of fourth, fifth, and sixth graders ability to read science text (Williams & Yore, 1985). Although overall results showed girls outscoring boys, there was a grade by sex interaction with boys outscoring girls by grade six. New Zealand, a country known around the world for its high standards and achievement in literacy (Guthrie, 1981), has recently experienced a downward trend in its international literacy standings caused in part by the declining literacy 75

PAGE 91

achievement of its boys (Wagemaker, 1993). Wilkinson (1998) conducted an extensive study of the literacy habits and abilities of grade four students in New Zealand. Although girls outscored boys, three factors served to reduce the gender gap: female teachers, teachers who had spent more years in preservice training, and the provision of less reading materials in the classroom. The researchers hypothesized that female teachers may be more responsive to individual differences and more likely to address reading difficulties faced by boys at an early age. This hypothesis reveals a recent change in attitude towards the preponderance of female teachers in the elementary grades. In the 1960s and 1970s in the United States, female teachers were blamed for feminizing boys and harming their educational performance (Smith, 1973). The finding for teacher training indicated that education helped teachers cater to individual differences among boys and girls. The counter-intuitive finding about classroom reading materials may reflect a tendency of girls to make greater use of classroom materials when they are available, such that when materials are scarce, the advantage girls gain from using the materials diminishes. In a study relating reading to mathematics, Tartre and Fennema (1995) found that verbal skill was a significant predictor of mathematics achievement for boys, but not for girls. The researchers did not draw any conclusions about the cause of this phenomenon, but a potential hypothesis is that boys underachievement in literacy has resulted in verbal skill becoming a major factor in determining how boys perform in other content areas. 76

PAGE 92

Two studies used survey data to understand the relationship between gender and literacy. Pottorff, Phelps-Zientarski, and Skovera (1996) studied the gender perceptions of elementary and middle school students about literacy in hopes of beginning to understand the causal relationship between gender and literacy achievement. The results indicated that both genders viewed girls as being more capable readers and writers than boys. Although boys viewed themselves more positively than girls viewed them, only a small percentage of boys saw themselves as being better at these activities than girls. The findings were consistent across grade levels and demographic populations. Commeyras (1999) surveyed literacy educators to assess their beliefs and interest in the sources of gender differences in literacy. She found that more educators believed that boys and girls behavior was a combination of biological and socio-cultural determinants than either cause alone, and that most teachers indicated some interest in knowing more about gender interest in literacy education. Acceptance that gender differences are, in part, socially constructed, coupled with educators interest in gaining more knowledge, indicated teachers belief in their ability to enact change and their interest in doing so. Summary of Reading Research Table 2 summarizes the results of the research studies examined in this section. The linguistic complexity of word problems appears to impact students problem solving success (Bernardo, 1999; Fan et al., 1994; Leong & Jerred, 2001), and students with reading disabilities are at a disadvantage when mathematical problems are presented in a written context (Ackerman & Dykman, 77

PAGE 93

1995; Fuchs & Fuchs, 2002; Jordan & Hanich, 2000; Jordan & Montani, 1997, Jordan et al., 2002). Studies of read-aloud accommodations did not have consistent findings, but generally speaking, reading disabled students benefited from having word problems read aloud to them in testing situations (Helwig et al., 1999; Helwig et al., 2002, Johnson, 2000; Fuchs et al., 2000; Tindal et al., 1998). Regarding the issue of gender, in most studies girls scored higher in reading, and had more positive attitudes about reading than boys (Gambell & Hunter, 1999; Pottorff et al., 1996; Wilkinson, 1998). Two studies had slightly different results, one finding similar reading scores among boys and girls by the end of fourth grade (Phillips et al.,2002), and one finding that girls lost their advantage by the sixth grade (Williams & Yore, 1985). Self-Regulated Learning The Concept and Nature of Self-Regulated Learning The topic of self-regulated learning (SRL) is a relatively new but important focus of study among educational psychologists. If researchers can begin to understand the questions of how, when, and why students take over the direction of their own learning, instructional strategies can be developed to support and encourage those behaviors. Zimmermans (1990) definition of SRL focuses on three main components. He states that self-regulated learners are metacognitively, motivationally, and behaviorally active participants in their own learning (p. 4). Self-efficacy and self-assessment, two key variables of this 78

PAGE 94

Table 2 Results of Reading Studies Grouped by Topic Author/Date Participants Results Bernardo, 1999 Fan et al., 1994 Hegarty et al., 1995 Leong & Jerred, 2001 Ackerman & Dykman, 1995 Fuchs & Fuchs, 2002 Jordan & Hanich, 2000 Jordan & Montani, 1997 Jordan et al., 2002 Kelly & Mousley, 2001 Pau, 1995 Helwig et al., 1999 283 Filipino students grades 2-4 29 1 st graders 38 undergraduate psychology students 91 3 rd -5 th graders 65 students ages 7.5-12 years 40 4 th graders with math disabilities but average intelligence 76 2 nd graders 48 3 rd grade students, half with mathematics disabilities 180 children studied through their 2 nd and 3 rd grade years 44 deaf and hearing college students 12 deaf children: 8-12 years old 247 6 th graders Problem Wordin g Students scored higher on problems reworded in simpler language. Problem wording affected the difficulty of the problems. Poor problem solvers relied on direct translation. Good problem solvers developed a mental model of the problem. Students performed better on consistently worded problems than on inconsistently worded ones. Students with disability in reading and math scored lower on a test of sequential memory than students with reading disability only. Students with math and reading disabilities scored lower on story problems than those with only math disabilities. Students with math and reading disabilities scored lower than all other groups. Math disability only students were weak on story problems. Students with math disabilities struggled with problem solving efficiency. Students with math and reading disabilities also struggled with problem conceptualization. Students who began the study with only reading disabilities finished at the same level as those who started with math disabilities. Deaf students scored lower than hearing students on word problems. Deaf students reading level was predictive of their problem solving ability. Students with Mathematics and/or Readin g Disabilities Students performed better when problems with challenging language were read aloud. Testing Accommodations for Disabled Students 79

PAGE 95

Table 2 (Continued) Author/Date Participants Results higher self-concepts of themselves as readers, and held a more positive attitude 80 Helwig et al., 2002 Johnson, 2000 Fuchs et al., 2000 Tindal et al., 1998 Borasi et al., 1998 Morales, 1998 Sovik et al., 1999 Commeyras, 1999 Gambell & Hunter, 1999 Phillips et al., 2002 Pottorff et al., 1996 Tartre & Fennema, 1995 Wilkinson, 1998 Williams & Yore, 1985 1,343 students grades 4, 5, 7, and 8 115 4 th graders 373 4 th graders, half with learning disabilities 481 9-12 year old students 4 secondary math classrooms 199 Mexican-American students grades 2-5 102 4 th graders 1,530 literacy educators 3,214 students in grades 5, 8, & 11 187 students in grades 1-6, longitudinal study 730 students in grades 2, 4, 6, & 8 60 students, longitudinal study of 6 th 12 th grade 2,949 grade 4 students 358 4 th 5 th & 6 th graders Reading test items aloud only benefited the elementary students. Reading test items aloud did not impact performance of non-disabled students but approached significance for reading disabled. Read-aloud accommodation helped reading disabled but not non-disabled students. Read aloud accommodation did not impact regular education students, but helped special education students. Use of transactional reading strategies improved sensemaking and text discussion. Students with faulty conception of problems struggled with solution processes. Students with good arithmetic ability, regardless of reading ability, chose similar problem solving strategies. Readin g Com p rehension Research Student beha vior is determined by genetics and societal influences. Educators desire to know more about gender and literacy. Girls scored higher in reading and had higher self-concepts and more positive attitudes about reading than boys at all grade levels. Girls performed better until the end of fourth grade when achievement leveled out. Girls are viewed as better at reading and writing. Few boys feel they are better at reading and writing than girls. Verbal skill was a significant predictor of mathematics achievement for boys but not for girls. Girls had higher reading scores, but differences were reduced by female teachers, highly educated teachers, and scarcity of classroom reading materials. Although girls outscored boys overall in reading, 6 th grade boys outscored girls. Gender Differences in Readin g

PAGE 96

study, are intertwined in these three components of SRL and will be discussed separately later in this section. Metacognitively, self-regulated learners set their own goals, monitor their progress, and self-evaluate throughout the learning process. In terms of motivation, self-regulated learners feel confident about their ability to be successful, and are intrinsically interested in learning. Behaviorally, self-regulated learners actively work to create environments that are conducive to learning. Self-regulated learners are unique in their awareness of the connection between the learning strategies they employ and their success in accomplishing their learning goals. They persistently monitor the effectiveness of their learning strategies and make changes when they fall short of reaching their goal. They also independently determine what learning outcomes are worthy of their interest and effort, and make judgments about their level of strategy use based on the degree to which they desire an outcome. Hence, most SRL theories consider learning and motivation as two interdependent processes (Zimmerman, 1990). Unfortunately, getting students to use strategies correctly is much more complex than simply teaching them the strategies. The process of selecting and monitoring strategy use requires high-level reasoning that in many cases is developmentally acquired (Zimmerman, 1990). If educational psychologists were able to determine when instruction for each self-regulatory strategy was developmentally appropriate, and develop concrete methods for teaching students to self-regulate, educators would more likely accomplish the goal of helping all children become independent learners. 81

PAGE 97

McCombs and Marzano (1990) voiced their concern that confusion about the interrelationships between metacognitive, cognitive, affective, and motivational aspects of SRL is negatively affecting the ability of research to bring about useful conclusions. This concern is well founded since the leading theorists in the field have very divergent views about the nature of SRL. McCombs and Marzano define self-regulated learning as the outcome of choosing to engage in self-directed metacognitive, cognitive, affective, and behavioral processes and skills (p. 52) They see SRL as being the result of both will and skill. In order to promote the development of SRL, interventions must target both of these elements. Winne (1995) sought to gain a deeper understanding of SRL through examination of empirical research on the subject that was not commonly cited in the literature. He acknowledged that self-regulating learners are more effective learners, and hoped to uncover specific interventions that would encourage self-regulatory behavior. He undertook his investigation for three main reasons. First, he believed that because so much of a students learning time is spent alone, understanding how learners develop and use SRL strategies, namely those strategies that students rely on when working alone, may help researchers develop better resources for SRL strategy instruction. Second, coming to understand how complex SRL behaviors develop in the absence of direct training may aid in the development of methods to support these behaviors. Finally, because SRL exists as a phenomenon, it is worthy of the effort required to understand it more fully. 82

PAGE 98

Winne (1995) enlisted the help of Pat, a senior honors psychology student, to uncover the features unique to self-regulating learners. He found that self-regulating learners are aware of the challenges that may hinder their learning process, they are deliberate in using the strategies that will enable them to achieve their learning goals, and they are in control of their attitudes and thinking. His review of research led him to conclude that when learners exert effort to accomplish a difficult task and find success, they are more likely to expend high levels of effort on other tasks perceived to be difficult. He also found that low-ability learners avoid self-regulatory behaviors when faced with a task that demands application of a new skill. Sound knowledge about a specific domain enables students to approach a novel task with more confidence, and therefore increases the likelihood that they will engage in self-regulatory behaviors. Self-regulated learning involves knowledge, beliefs, and learned skills, and can therefore be impacted by environmental changes. Winne concluded that the solitary nature of much of students learning efforts was just cause for more attention to be focused on researching the details of inherent SRL. In reaction to Winnes (1995) writing, Alexander (1995) focused on two elements of SRL that she believed were not properly portrayed. First, she posited that self-regulation that occurs in solitude has not been proven to be markedly different from that which occurs in social situations. Second, she argued that Winne failed to delineate the degree to which SRL is a general characteristic of the learner versus being related more to specific situations or content areas. 83

PAGE 99

Alexander (1995) stated that the SRL behaviors that occur when learning alone, learning with others, and learning in the presence of others, may not be entirely distinguishable as portrayed by Winne (1995). She claims that even learners who appear to be working alone, may actually be engaging in a private mental dialogue with themselves as either audience or teacher. Also, it should not be assumed that learners engagement in a social learning situation precludes the internal dialogue that might occur in solitude. Alexander also contradicts Winnes claim that self-regulating learners exert higher levels of effort than those who are less self-regulating. She notes that a learner effectively engaging in SRL may determine that his/her effort is not warranted by the task. Mindless engagement in trivial tasks should not be considered a positive quality that is associated with SRL. On the contrary, self-regulated learners should selectively engage in tasks that are meaningful. In addition to these differences in point of view related to SRL, Winne (1995) and Alexander (1995) also disagree on the nature of self-regulatory behaviors associated with levels of competence in specific domain areas. Winne states that self-regulated cognitive engagement is a deliberate, judgmental, adaptive aptitude of expert proportion (p. 191). Alexander counters that for students who are novices in a particular domain area, even their greatest efforts at self-regulatory behaviors are not likely to be of expert quality. She claims that self-regulatory efforts are likely less frequent, more difficult, and less rewarding (p. 192) than when the learner is competent in the domain area. She believes that future research in SRL must take into account the context in which the 84

PAGE 100

learning occurs together with the learners interest and competency level in that context. Pressley (1995) also responded to Winnes (1995) article with his own thoughts on the nature of SRL. Pressley views SRL as much more complex than Winne, developing over years and decades, with real expertise not likely developing until one has been a scholar at a high level for a long period of time. He also argues that self-regulatory strategies must develop over a long period of time before they can be transferred to new contexts. The reasons for this are fourfold. First, for new strategies to be integrated, they must prove themselves to be superior to older, more deeply ingrained strategies. Second, simply knowing how to use a strategy does not mean that the learner will know when and where to use it. Third, learning a strategy does not guarantee that the learner will fully comprehend the usefulness of the strategy. Finally, learning a strategy does not guarantee that the learner will know how to modify the strategy for use in a new situation. Pressley does not ascribe to Winnes optimism that teaching self-regulation strategies will result in their use. Pressley (1995) and Alexander (1995) agree in arguing against Winnes view of the solitary nature of much of students learning experience. Pressley notes that even study that appears to be solitary may be influenced by many social inputs. He points out that though seatwork seems solitary, it is largely influenced by recent instruction that occurred in a social context. The student who appears to be working alone is not thinking alone (p. 210). Also, the child 85

PAGE 101

who is working alone often has access to social support simply by raising her/his hand. In support of Winnes (1995) work, Pressley recommends adherence to some of the principles of teaching Winne encouraged. Students should be provided with varied opportunities to realize that their effort is worthwhile. Students should be able to practice new strategies until they become routine before they are expected to use them in self-regulatory behaviors. They must come to understand that learning is difficult and requires effort, even for the most able learners, and that knowledge is subject to multiple perspectives. Individual Differences in SRL Winne (1996) followed his original article with new writing on individual differences in self-regulated learning and the elements that contribute to those differences. He begins by breaking the SRL process down into two phases. The first is to understand the task, outline goals, and make plans for accomplishing those goals. It is in this first stage that feelings of self-efficacy come into play. The second phase involves carrying out the plan and making adjustments to self-regulatory strategies in response to both metacognitive monitoring and feedback received from external sources. It is during this second phase that self-assessment plays an important role. Winne (1996) points to the lack of research using individual differences as an object of study. He cautiously outlines five potential sources of individual differences in the metacognitive aspects of SRL that may be fruitful sites for future study. All students vary in their domain knowledge, and an expert in one 86

PAGE 102

area may have less need for self-regulatory behavior than a novice. Similarly, all learners differ in their knowledge of cognitive strategies. No two learners are identical in their ability to perform these strategies, depending on the degree to which they have utilized the strategies and have developed automaticity. Every learner differs in his/her ability to regulate tactics and strategies. Finally, students dispositions and styles vary. Classroom Applications of SRL Paris and Paris (2001) contribute to the field of knowledge about SRL through their work on ways to apply research findings to the classroom. They point to the changing perspective in the field of educational psychology that increasingly seeks to link its work to interventions that can be implemented in the classroom for the benefit of students. They list six stages of instructional practice that support the development of self-regulatory behaviors. Teachers should begin by activating students prior knowledge, followed by discussion and cognitive modeling of strategies. These stages are followed by mnemonic memorization, supported performance, and independent performance. These stages should not be considered strictly linear, as it may be necessary to return to one or all of the stages as students develop their SRL strategies. Successful interventions in the development of self-regulation tend to provide a variety of strategies with guidance on how, when, and why they may be utilized. Students need to see that their success in performing a task is the direct benefit of their self-regulatory strategies as opposed to either their innate ability or the skill of the teacher. Students should learn to value the input of their peers 87

PAGE 103

on the use of strategies, and strategies should become a regular part of activities throughout the curriculum. The optimum classroom environment for development of SRL is characterized by open-ended tasks that allow for some student choice, encouragement of collaboration, expectation of high performance, and facilitation of students self-efficacy in dealing with tasks. Students need to be provided the opportunity to develop their self-regulatory behaviors indirectly through classroom experiences, directly through instruction, and also through practice (Paris & Paris, 2001). Assessment is another key to students motivation to engage in self-regulatory behaviors. Self-assessment is related to the cognitive, motivational, and affective aspects of SRL. As students develop their ability to monitor their actions, they become more effective and accurate in assessing their own performance across a wider range of behaviors. They become more able to assess their level of comprehension, their level of effort, and their own interests. When students become proficient in detecting quality in their performance, their feelings of self-efficacy and perceptions of ability increase (Paris & Paris, 2001). Both internal and external factors affect students self-assessment processes. Internal factors include self-appraisal, the evaluation of ones own ability and self-management, and the planning that results from monitoring ones ongoing behavior. External factors that impact self-assessment include the curricula and assessment activities selected by teachers. Closed tasks that do not allow students to pursue knowledge independently, and objective tests such as multiple choice and true-false, do little to promote SRL. In contrast, authentic 88

PAGE 104

assessment is by nature only possible in response to more open-ended and engaging tasks. Portfolios also provide a context for students to assess not only their current work, but also their growth over time (Paris & Paris, 2001). SRL in the Field of Mathematics Cognitive theory has succeeded in breaking down complex tasks such as mathematical problem solving into component processes and strategies. The goal of this effort is ultimately to improve instructional methods and student achievement. Current research has found that students who are successful in mathematical problem solving are strategic, and actively engage in using strategies to improve their performance. More specifically, good mathematics students possess and utilize metacognitive knowledge to direct their learning and problem solving. They know when, how, and where to use specific strategies, and they continually monitor the use of these strategies so they are able to make appropriate modifications throughout the problem solving process. These students also possess sufficient mathematical content knowledge to be prompted when to use a particular strategy, and when strategic activity may be unnecessary (Braten, 1998). Students who can activate and persistently use their strategies, metacognition, and domain-specific content knowledge in mathematical problem solving are said to be practicing self-regulation. Childrens self-directed or private speech becomes the primary instrument of self-regulation (Braten, 1998). During self-regulation, children, in effect, become their own teachers, using an unspoken literacy as their private guide through the problem solving process. 89

PAGE 105

Students must know how to monitor and regulate their own actions, develop self-awareness, and reflect on their own actions. These metacognitive behaviors are best learned and exercised in the context of real problem solving events (Leinhardt & Schwartz, 1997). Metacognitive instruction has been found to benefit problem solving, reading comprehension, and writing; when instruction is provided for one of these processes, there is a positive carryover effect to the other processes (Kramarski, Mevarech, & Arami, 2002). The major goals of metacognitive instruction in mathematical problem solving are to help students comprehend the problem, construct connections between previous and new knowledge, use appropriate strategies to solve the problem, and reflect on the process and the solution (Kramarski, Mevarech, & Arami, 2002). Metacognitive strategy instruction should include work on both self-monitoring and self-regulation. Self-monitoring enables students to focus attention on their learning processes, and self-regulation helps students become goal-directed (Cardelle-Elawar, 1995). Proficient Problem Solvers A look at the cognitive and metacognitive characteristics of good problem solvers provides a picture of what goals teachers should set for every student. To begin, good problem solvers focus their attention on the problem and persist in considering all relevant information. They assess all the information before making a conclusion, and test or evaluate all their potential solutions before moving on (Kelly & Mousley, 2001). During the problem solving process, good problem solvers make much greater use of self-explanations, the practice of 90

PAGE 106

mentally talking through the problem situation, than do poor problem solvers (Reed, 1999). Childrens ability to generate frequent and powerful self-explanations of a problem situation significantly predicts their success. The best learners take a holistic view of problems, focusing on the structural rather than the surface features of a problem, and are able to think ahead in the problem solving process (Goldman & Hasselbring, 1997; Kramarski, Mevarech, & Arami, 2002; Wong, Lawson, & Reeves, 2002). Less Proficient Problem Solvers In contrast to the thinking processes of proficient problem solvers are the cognitive and metacognitive behaviors of less proficient problem solvers. Low-performing students often exhibit patterned behaviors when faced with a problem-solving task. They may read too quickly to gain full comprehension, may not reorganize information, may fail to realize that there may be more than one possible path to solving the problem, or they may be uncertain how to calculate or verify their solution and give up (Cardelle-Elawar, 1995). Students who suffer from metacognitive deficits are less likely to be able to use strategies effectively to represent problems and execute solutions. This deficit impedes progress on tasks requiring considerable strategic activity such as mathematical problem solving (Montague, 1997). Less proficient problem solvers often do not see a task as a whole, but instead focus on only parts of the task. They often score low in working memory tasks and have difficulty reorganizing information and distinguishing between relevant and irrelevant information (Passolunghi & Siegel, 2001; Kramarski, Mevarech, & Arami, 2002). 91

PAGE 107

Language and cognitive skills are interdependent, with the result that students with language difficulties often have trouble with memory, attention, selection of appropriate strategies, and generalization of learning to new tasks. The metacognitive skills necessary for successful problem solving, such as self-regulation, awareness of ones own knowledge, and the use of cognitive strategies, are often lacking in students with language-based difficulties (Gurganus & Del Mastro, 1998; Landi, 2001; Stevens & Zaharias, 1997). These students with disabilities seem to be aided by instruction in processing the problem schemata so they can recognize and represent the situation described in a problem (Jitendra & Xin, 1997). Cognitive strategy instruction, which addresses students reading comprehension and problem-solving deficiencies, seems to be a promising approach to helping students with learning disabilities apply their skills to become better problem solvers. Specific cognitive strategies include visualization, verbal rehearsal, paraphrasing, summarizing, and estimating. In order to benefit from this instruction, students must possess adequate reading and mathematical computation skills (Montague, 1997). For students who have a strong repertoire of problem-solving strategies but use them ineffectively, instruction in metacognitive strategies such as self-instruction, self-monitoring, or self-assessment may improve their ability to solve mathematical problems (Cardelle-Elawar, 1995; Montague, 1997). 92

PAGE 108

Highlighted Research Findings in Self-Regulated Learning Self-regulated learning has been defined differently by different researchers. For the purposes of this study, self-regulated learning is defined as students being metacognitively, motivationally, and behaviorally active participants in their own learning (Zimmerman, 1990, p. 4). Studies in various domains have determined that metacognitive knowledge is a good predictor of students strategy use. However, very few studies have focused on the content area of mathematics, and only a small portion of those have focused on young elementary-aged children. These studies have found that teaching metacognitive knowledge about strategies to young children has the potential to improve their mathematical performance. Even high school students continue to benefit from this training because they often fail to monitor their own progress in problem solving (Braten, 1998). Cardelle-Elawar (1994) conducted a study to determine the effect of metacognitive instruction on low-achieving thirdthrough eighth-grade students mathematics achievement and attitudes. The study results indicate that metacognitive training in self-monitoring and self-regulation was effective in facilitating the problem solving processes of low-achieving students. The process of self-inquiry developed in the treatment group seemed to build students understanding by helping them reflect on the processes required to solve problems instead of being passive followers of procedure. Malpass (1996) investigated the effects of self-regulated learning, self-efficacy, goal orientation, and worry on high-stakes mathematics tests. Students were tested on these four 93

PAGE 109

variables using a questionnaire after they completed the Advanced Placement calculus exam. The results of the study indicate that learning goal orientation was moderately related to self-regulated learning, and that self-regulated learning positively, but marginally, affected mathematics achievement. Regarding self-efficacy, Malpass found that it mediated the relationship between prior and current mathematics achievement, and was related to self-regulation. The relatively small sample, the fact that most participants were Asian-American, and the fact that the mathematics achievement was measured in a high-stakes environment, may have compressed the correlations and resulted in the appearance of a weak relationship between self-regulation and achievement. Fuchs et al. (2003) hypothesized that helping students to self-regulate would have a positive impact on their problem solving abilities. They noted that previous research in the area of mathematics has focused primarily on computational skills. With the reform emphasis on problem solving and the fact that complex problem solving requires metacognition and perseverance, they chose to focus their research efforts on problem solving rather than computation. They randomly assigned third-grade teachers to one of three conditions: control (teacher-designed instruction), transfer (including teaching rules for problem solving, teaching for transfer, and cumulative review), and transfer-plus-SRL. In the transfer-plus-SRL treatment, students scored and tracked their own work, were encouraged each day to try to outscore their work on the previous day, and were periodically invited to share with the class how they were able to transfer their learning in mathematics to other situations. Their results indicated that 94

PAGE 110

improvement of transfer varied as a function of condition, with the transfer-plus-SRL group scoring the highest, followed by the transfer group and the control group. On measures of self-regulation, a significant effect for treatment was found for all measures. On the questions assessing students perception of their learning, effort, and improvement, the transfer-plus-SRL group outscored the transfer group. These results indicate that mathematical problem solving may be strengthened by explicit transfer instruction. Also, instruction designed specifically to increase students self-regulatory behaviors promotes both self-regulatory processes and learning. Summary of Self-Regulated Learning The results of the highlighted studies on self-regulated learning are summarized on Table 3. With all that is known about self-regulatory learning in a general sense, it appears that efforts to determine how to encourage and support this behavior are warranted. In the field of mathematics, with the current focus on problem solving, these efforts may further help teachers guide students to becoming independent learners. Whether we ascribe to Winnes (1995) notion of the solitary nature of learning, or to Pressley (1995) and Alexanders (1995) view of the social nature of even seemingly solitary work, the impact of students ability to regulate their own learning should not be underestimated (Cardelle-Elawar, 1994; Fuchs et al., 2003; Malpass, 1996). As both self-efficacy and self-assessment play a vital role in self-regulation and are key elements of this research, an in-depth review of both topics follows, with special emphasis on their relationship to mathematics. 95

PAGE 111

Table 3 Results of Self-Regulation in Mathematics Studies Author/Date Participants Results Self-Efficacy Cardelle-Elawar, 1994 Fuchs et al., 2003 Malpass, 1996 489 3 r d -8 th grade students 395 3 rd grade students 144 gifted high school students Training in self-monitoring and self-regulation improved the performance of low students. In the treatment that included SRL, students were better able to transfer learning to new problem solving situations compared to the treatment without SRL. SRL positively affected mathematics achievement. Self-efficacy is highly and negatively related to worry. Self-Efficacy Self-efficacy is defined as personal beliefs about ones capabilities to learn or perform specific behaviors and skillful actions. Researchers hypothesize that feelings of self-efficacy influence choice of activities, effort, persistence, and achievement. Compared with students who are not confident about their learning capabilities, those with high self-efficacy for accomplishing a task work harder, participate more readily, persist longer when difficulties are encountered, and achieve at a higher level (Bandura, 1986; Pintrich & Schrauben, 1992). Students who feel efficacious about learning are more persistent and are more likely to select effective strategies (Zimmerman, 1989). These relationships also hold when reversed, as proficient students tend to possess high self-efficacy for accomplishing academic tasks and believe they have the capacity to learn and solve problems at designated levels by exerting task-appropriate strategies (Bandura, 1986; Braten, 1998) 96

PAGE 112

Mathematics self-efficacy is distinguished from other measures of mathematics attitudes in that it measures an individuals confidence in her/his ability to perform a specific task successfully, as opposed to a general feeling of confidence in the domain of mathematics (Hackett & Betz, 1989). Self-efficacy is context-dependent, depends on a mastery criterion of success rather than normative criteria, and is assessed prior to actual performance, so it can play a causal role in academic functioning (Hanlon & Schneider, 1999). Bandura (1997) believes that self-efficacy expectations are a major determinant of whether a student will attempt a particular task, how much effort he/she will be willing to expend, and how much persistence will be displayed in tackling the task in the face of obstacles. These beliefs of self-efficacy mediate the influence of other determinants of academic outcomes such as innate ability. Students confidence in their own capabilities impact what they do with the knowledge and skills that they possess. Therefore, cognitive theorists posit that students academic performance is governed in large part by the confidence they exhibit when approaching academic tasks (Pajares & Miller, 1997). In later writing, Bandura (1986) cautioned that because self-efficacy judgments are domain and context specific, measures of self-efficacy should be specifically tailored to the domain and task being analyzed. He also noted that self-efficacy judgments should ideally be measured immediately before engagement with the task being assessed. The relation between efficacy beliefs and action is revealed most accurately when they are measured in close temporal proximity. The closer in time, the better the test of causation (Bandura, 97

PAGE 113

1997, p. 67). Bandura (1986) also argued that the most functional efficacy judgments are those that slightly exceed what a student can actually accomplish, and that excessive overconfidence can result in serious, irreparable harm (p. 394). Self-Efficacy Measurement and Relationship to Performance Self-efficacy research in the field of education has focused on two primary areas: the relationship between self-efficacy beliefs and academic motivation and achievement, and the link between self-efficacy and college major or career choices (Pajares & Miller, 1995). When self-efficacy was yet a young field of study, Dowling (1978) created a self-efficacy scale to measure the correspondence between confidence and performance. Her task-specific Mathematics Confidence Scale (MCS) allowed students first to provide judgments about their ability to solve math problems, then later to complete an alternate-forms test of the problems on which their confidence was assessed. Betz and Hackett (1983) then developed the Mathematics Self-Efficacy Scale (MSES) incorporating a measure of task-specific efficacy similar to Dowlings, with the addition of two scales. The first scale measures students feelings of self-efficacy about performing math-related tasks, such as figuring income tax or the amount of wood needed to complete a project. The second scale measures students confidence in their ability to earn an A or B in certain math-related courses. The MSES has been used in studies of both academic performance and career choice, but the correlations between the composite MSES score and actual performance were lower than the researchers anticipated (Hackett & Betz, 98

PAGE 114

1989). The study also found that sex differences in mathematical self-efficacy were correlated with sex differences in mathematical performance. However, they found no support for their hypothesis that womens self-efficacy expectations were unrealistically low when compared to their actual performance. In contrast to the findings of the MSES study, Pajares and Miller (1994) obtained a correlation of .70 when they compared scores on a form of Dowlings (1978) MCS with performance scores on the same test on which self-efficacy was measured. They also found in a path analysis that task-specific self-efficacy had a stronger direct affect on problem-solving performance than an overall measure of mathematics self-concept, perceived usefulness of mathematics, mathematics background, number of college credits, or gender. Not surprisingly, researchers (Randhawa, Beamer, & Lundberg, 1993) who have used full-scale MSES scores as a means of measuring generalized mathematics self-efficacy, have discovered much weaker relationships between self-efficacy ratings and performance. In Randhawa et al.s study, the criterial task, solving mathematical problems, was only related to one of the three scales of the MSES, and the actual problems on the performance measure were different from those on which students judged their self-efficacy. The issue of concern in Randhawa et al.s study is not the finding of a weak relationship between self-efficacy as measured by MSES and performance, but rather the methodological flaw of attempting to find a correspondence between generalized self-efficacy and performance on a specific set of problems. Bandura (1986) cautioned researchers about the necessity of matching self-efficacy judgments that are by definition task-specific, 99

PAGE 115

to the actual tasks being assessed. However, his warnings have apparently been disregarded by some who fail to provide a match, either in specificity or content, between the tasks on which self-efficacy is measured and those on which performance is measured. Mathematics self-efficacy is defined as a situational or problem-specific assessment of an individuals [feelings of] confidence in her or his ability to successfully perform or accomplish a particular task or problem (Hackett & Betz, 1989, p. 262). In keeping with this definition, and following Banduras (1986) guidelines for measuring self-efficacy, Pajares and Miller (1995) set out to determine whether students judgments of their ability to solve mathematics problems was more predictive of their ability to solve those problems than either their confidence in solving other math-related tasks or succeeding in math-related courses. In essence, they hoped to show that the three scales included in the MSES are most highly correlated only with the specific things they measure, and that attempts should not be made to correlate MSES composite scores with performance on specific tasks. In a study of 391 undergraduates, they found that although the three efficacy subscales were related to each other and were significantly related to problem-solving performance, the problem solving self-efficacy measure had a stronger relationship with problem-solving performance than did either of the other two subscales. The results confirm Banduras (1986) assertion that the most theoretically appropriate way to assess self-efficacy is through tasks that require the same or similar skills as those needed for the performance task. 100

PAGE 116

Pajares and Miller (1997) took their previous research to a new level of specificity by examining whether the form of self-efficacy assessment (multiple-choice or open-ended question format) differentially predicts performance on multiple-choice or open-ended tests. In all measures of calibration, defined as the degree to which students accurately predict their performance, the format of the self-efficacy instrument did not impact the relationship between self-efficacy and performance. However, for both forms of the self-efficacy instrument, calibration scores were lower for the students who were tested with an open-ended test. Boys were overall better predictors of their performance than girls, but the interaction effect led to interesting results. Girls with low self-efficacy performed better than boys with low self-efficacy, yet girls with high self-efficacy did not perform as well as boys with high self-efficacy. Bong (2002) sought to confirm Banduras (1997) claim about the necessity of measuring self-efficacy in close temporal proximity to measuring performance and with a high level of specificity. She found that the more specific measures of self-efficacy, those that measured self-efficacy on tasks that were very similar in content and context to the actual performance tasks, were superior to general measures when predicting outcomes. Contrary to the hypothesis, the relationship between self-efficacy beliefs and achievement scores did not differ much by the temporal distance between their assessments. It appears from this study that measuring self-efficacy at an appropriate level of specificity to the target outcome is more important than assessing it closer in time. 101

PAGE 117

A meta-analytic investigation of the relationship between self-efficacy beliefs and academic outcomes helped to solidify the relationship between self-efficacy and performance (Multon, Brown, and Lent, 1991). The 39 studies included in the analysis were published between 1977 and 1988, included measures of self-efficacy and performance, and provided enough information to calculate pooled effect sizes. The results of the analysis support the notion of a positive relationship between self-efficacy and academic performance and persistence. They found a stronger relationship between self-efficacy and performance among low-achieving students as compared to normally achieving students. The strongest relationship between self-efficacy and performance was produced by the most specific assessment: basic skills. The second strongest relationship was found in the second most specific assessment: classroom-based performance. The weakest relationship existed with the most general assessment included in the study, that of standardized achievement tests. This finding lends support to Banduras (1986) belief in the situation-specific nature of self-efficacy, but is cause for alarm in regards to students relatively poor ability to predict their performance on standardized tests. This finding may be due to the relatively loose match between self-efficacy and performance in form, content, and timing for the achievement test, but in the current climate of high-stakes testing, it certainly deserves further study. Self-Efficacy Versus Self-Concept In order to add to the field of cognitive psychology, self-efficacy must be distinguished from other dimensions such as academic self-concept. In a 102

PAGE 118

confirmatory factor analysis, Lent, Brown, and Gore (1997) found that although self-efficacy is related to other measures of generalized mathematics self-concept, it is neither interchangeable with nor subsumed by self-concept. Their results also found that self-efficacy is a significant predictor of math-related performance. Pajares and Miller (1994), and Pietsch, Walker, and Chapman (2003) found that efficacy beliefs have a stronger association with academic performance in mathematics than self-concept. In fact, in Pietsch et al.s study, the path coefficient from self-efficacy to performance was significant, but the path coefficient from self-concept to performance was not. In support of Banduras (1997) call for specificity when measuring self-efficacy, the students in the study appear to have considered each problem and their perceived competence to solve it as a unique event. Self-Efficacys Role in Self-Regulation Banduras (1986) triadic theory of social cognition posits that students efforts at self-regulation are determined by three factors: their personal processes, the environment, and their behavior. Students self-regulated learning is not believed to be an absolute state of functioning, but varies based on the academic context, personal efforts to self-regulate, and performance outcomes. The effectiveness of their self-regulatory strategies depends on the development of a sense of self-efficacy for learning and performing well (Schunk, 1995, 1996). Although self-efficacy has been shown to impact students levels of self-monitoring and their academic motivation and achievement, little work has 103

PAGE 119

been done to relate self-efficacy to students use of these self-regulated learning strategies. Zimmerman and Martinez-Pons (1990) sought to show that students level of academic efficacy is predictive of their use of triadic self-regulation strategies. They found that boys surpass girls in verbal efficacy but are comparable in measures of mathematical efficacy. Although girls exhibited more planning, goal-setting, record keeping, and monitoring behaviors than boys, overall, both boys and girls perceptions of mathematical and verbal efficacy were correlated with their use of self-regulated learning strategies. A glaring weakness of this study was the lack of performance data used in establishing the accuracy of students efficacy perceptions. Academic self-regulation seems to be independently influenced by self-efficacy at the outset (Bandura, 1997), and by self-evaluation at the conclusion (Laveault, Leblanc, & Leroux, 1999). In a small study of middle school students, Laveault et al. found that although boys and girls performed similarly on mathematics tasks, boys had higher self-efficacy than girls. On completion, girls were more likely to report that the task was more difficult than expected. Students who clearly understood the task were more confident and set higher goals. Self-Efficacy and Math-Related Career Choices Although the relationship between self-efficacy and performance in mathematics is a critical issue, the long-term impact on students choices of career lends further credence to the necessity of studying this phenomenon. 104

PAGE 120

Hackett (1985) proposed that the gender differences in the numbers of male and female students who select mathematics-oriented college majors may be explained by the proven gender differences in mathematics self-efficacy. Cooper and Robinson (1991) found a moderately strong relationship between mathematics self-efficacy and career self-efficacy among college students who had selected math-related majors. They also found that, for this group, there were no significant gender differences in mathematics self-efficacy. Unfortunately, this finding does not illuminate the relationship between mathematics self-efficacy and career choice because the participants had already selected math-related careers, and therefore all likely had high levels of mathematics self-efficacy. Also, although the fact that nearly eighty-percent of the participants were male does highlight the imbalance of females in math-related majors, it hinders the generalizability of the researchers conclusions. This study would have been much more effective in drawing conclusions about gender differences in efficacy if it had used a random sample rather than one composed exclusively of students who had already demonstrated their efficacy stances through their career choice. OBrien, Kopala, and Martinez-Pons (1999) used path analysis to examine the contributions of prior mathematics achievement, ethnic identity, gender, and self-efficacy to students science-related career interests. They found a strong direct link between self-efficacy and career choice, but while ethnic identity and prior achievement impacted self-efficacy, gender did not. The only variable in the study significantly impacted by gender was career interest. This research signals 105

PAGE 121

a need for further work on the influence of mathematics self-efficacy on career choices. Intervention Efforts for Performance and Career Choices Although little is known about the relationship between mathematical self-efficacy, performance, and career choices, even less research has been conducted to explore the effects of intervention on these relationships. Hanlon and Schneider (1999) worked with a small group of high school graduates in an intensive five-week summer program aimed to improve the accuracy of students self-efficacy judgments. Throughout the intervention period, students were asked to rate their level of self-efficacy on the current content and were then tested on the content. The intervention did not lead to a reduction in the discrepancy between students math quiz scores and self-efficacy perceptions. However, it is impossible to determine whether the intervention actually proved beneficial because the course material increased in difficulty as the program progressed, possibly making it more difficult for students to assess their own abilities with accuracy. Fouad, Smith, and Enochs (1997) attempted to validate the use of a self-efficacy scale for assessing changes in middle-school students career-related self-efficacy, outcome expectancies, and intentions and goals resulting from career-related self-efficacy intervention. They found evidence of adequate reliability and validity for use of the instrument in assessing the results of the intervention program, but noted that intervention efforts with minority and female students did not result in significant gains. 106

PAGE 122

Schunk and Gunn (1986) conducted the only published study found that addressed the impact of intervention on the mathematical self-efficacy of elementary-aged students. They provided intensive training in solving division problems to determine the effect of the intervention on childrens self-efficacy and subsequent performance, as well as how attributions affected childrens self-efficacy and skills. They found a strong relationship between the intervention and performance, and between self-efficacy and performance, but not between the intervention and self-efficacy. They also found a negative relationship between effort attributions and self-efficacy, indicating that when students become more confident in their ability to solve a problem, they are less likely to attribute their success to effort. Although the results are interesting, they have limited applicability to the reform-oriented classroom because the only mathematics task involved was division computation. Computation skills have received reduced emphasis in the classroom as a result of the recommendations of the National Council of Teachers of Mathematics (1989). In addition, the sample size was too small to prove stable for use in multiple regression, and the researchers classification of algorithmic procedures as strategies is problematic. The term strategies implies a deep level of cognition more often associated with problem solving than the simple application of step-by-step procedures used to solve the computation exercises described in the study. Gender and Self-Efficacy The relationship between gender and self-efficacy has not been as fully explored as the relationship between self-efficacy and mathematical 107

PAGE 123

performance, and the findings of gender-related research are not consistent (Junge & Dretzke, 1995; Pajares & Miller, 1997). Although some research supports the notion that females are less self-efficacious than males in mathematics content (Laveault et al., 1999), and in relation to mathematics courses (Lent, Brown, & Gore, 1997), other studies have found no significant gender differences (Cooper & Robinson, 1991; Hackett & Betz, 1989; Lundeberg, Fox, & Puncochar, 1994; Zimmerman & Martinez-Pons, 1990). The only consistent finding is that when gender differences have been detected, males have been more confident than females in their mathematics ability (Laveault et al., 1999; Randhawa et al., 1993). Lundeberg et al.s (1994) study of college psychology majors did support the notion that men are more confident than women in mathematics, but since both men and women were overconfident about the accuracy of their answers, it did not support the notion that women generally suffer from low confidence. Also, women and men answered very differently when estimating their general feelings of confidence versus their confidence in the accuracy of their answers on specific problems. These results confirm the findings of other studies that gender differences in confidence are dependent on the level of specificity and content of the questions being asked (Laveault et al., 1999; Randhawa et al., 1993). In a study of mathematically gifted students, Junge and Dretzke (1995) found statistically significant gender differences even though the self-efficacy ratings of most students were high due to their gifted status. Although the males 108

PAGE 124

exhibited higher overall mathematics self-efficacy, the older students, both males and females, were more self-efficacious than their younger counterparts. Williams (1994) focused her research on understanding gender differences in the discrepancies between self-efficacy and performance across four subject domains: English, mathematics, reading, and science. Her findings concurred with other studies in establishing a positive relationship (explaining 35% of the variance) between self-efficacy across content areas for both genders. Interestingly, the relationship between self-efficacy and performance was stronger for mathematics than any other content area for both males and females. Most student efficacy scores were not congruent to their performance scores, but males efficacy estimations were more accurate than females. Self-Efficacy Summary Table 4 provides a summary of the studies reviewed in this section. The research on self-efficacy related to mathematics confirms the task-specific nature of the construct (Bong, 2002; Pajares & Miller, 1994; Pajares & Miller, 1995; Randhawa et al., 1993), and establishes self-efficacy as a key element of selfregulation (Bandura, 1997). Mathematics self-efficacy is positively related to academic performance, persistence, and math-related career choices (Multon et al., 1991; OBrien et al., 1999), but intervention efforts have thus far not proved successful (Fouad et al., 1997; Hanlon & Schneider, 1999; Schunk & Gunn, 1986). Both males and females tend to be overly confident of their mathematical ability (Junge & Dretzke, 1995; Lundeberg et al., 1994), but males self-efficacy 109

PAGE 125

Table 4 Results of Self-Efficacy Studies Grouped by Age of Participants Author/Date Participants Results Cooper & Robinson, 1991 Hackett & Betz, 1989 Hanlon & Schneider, 1999 Lent, Brown, & Gore, 1997 Lundeberg et al., 1994 Pajares & Miller, 1994 Pajares & Miller, 1995 Bong, 2002 Junge & Dretzke, 1995 Pietsch et al., 2003 291 college math-related majors (229 men, 61 women) 262 college students (109 men, 153 women) 17 new college students attending summer program 205 psychology students (54 men, 151 women) 254 college psych majors (73 men, 181 women) 350 undergraduates (121 men, 229 women) 391 undergraduates (144 men, 247 women) 235 high school freshmen (all girl school) 113 gifted grade 9-12 students (58 boys, 55 girls) 416 high school students Self-efficacy is related to performance and career self-efficacy. No gender differences exist. Moderate correlation between mathematics self-efficacy and performance using MSES. No gender differences in self-efficacy. Intervention in self-efficacy did not prove beneficial. Self-efficacy and self-concept are unique dimensions. Self-efficacy can predict math performance. Most students were overconfident; women had more accurate perceptions of their incorrect answers than men. Self-efficacy had a stronger direct effect on performance than gender, high school math level, or college credits. Self-efficacy ratings of problems was more highly correlated with problem performance than were self-efficacy ratings on tasks or courses. Used MSES. Studies of Colle g e-a g ed Students Specificity of self-efficacy measures more important than temporal proximity. Males and older students were more self-efficacious, though all rated high. Math self-efficacy predicted performance, but self-concept did not. Studies of Hi g h-School Students 110

PAGE 126

Table 4 (Continued) Author/Date Participants Results Randhawa et al., 1993 OBrien et al., 1999 Williams, 1994 Fouad et al., 1997 Laveault et al., 1999 Pajares & Miller, 1997 Schunk & Gunn, 1986 Multon et al., 1991 Zimmerman & Martinez-Pons, 1990 225 high school students (117 male and 108 female) 415 eleventh grade students (221 boys, 194 girls) 131 college-bound 11 th and 12 th grade students (43 male, 88 female) 361 seventh and eighth graders 45 grade six to eight students 327 eighth-grade algebra and pre-algebra students (149 boys, 178 girls) 50 upper-elementary students (28 boys, 22 girls) Meta-analysis 180 students (60 fifth-graders, 60 eighth-graders, 60 eleventh-graders) Mathematics self-efficacy mediates the relationship between mathematics attitudes and achievement. Boys were more self-efficacious, but did not perform better than girls. Used MSES. Self-efficacy is strongly related to career choices, and ethnic identity is a predictor of self-efficacy. Gender predicts career interest. Self-efficacy was a stronger predictor of math performance than other subjects. Most students did not accurately predict their outcomes. The Middle School Self-Efficacy Scale is valid, but intervention efforts for females and minorities were not successful. Boys had higher self-efficacy than girls; on completion, girls thought the tasks were harder than boys. Performance was equal. The test format, not the self-efficacy format, impacted calibration scores. Boys were overall better predictors of their performance than girls. Intervention efforts and self-efficacy had a positive impact on performance, but the intervention did not increase self-efficacy. Self-efficacy is positively related to performance and persistence. The strength of the relationship varies according to student ability and type of outcome measured. Girls plan, monitor, and goal-set more than boys. Boys have higher verbal efficacy but the same math efficacy as girls. Efficacy impacts use of self-regulatory strategies. Studies of Students in Elementar y and Middle Grades Studies of Students Across A g e Grou p s 111

PAGE 127

surpasses that of females (Laveault et al., 1999; Randhawa et al., 1993), and students tend to grow less self-efficacious as they mature (Junge & Dretzke, 1995). For these reasons, continued research efforts to understand the relationships among mathematical self-efficacy, performance, and gender are warranted. Self-Assessment Over a quarter-century ago, Lichtenstein and Fischhoff (1977) served to revive interest in metacognitive processes in general and self-assessment in particular with the question: Do those who know more also know more about how much they know? (p. 159). Students ability to self-assess their knowledge and the skills that need improvement are valuable self-regulatory strategies (van Kraayenoord & Paris, 1997) and provide teachers with useful assessment data (Charles, Lester, & O Daffer, 1987). When students become proficient in monitoring and interpreting their actions, they gain the ability to assess their work across a wider range of cognitive skills with more accuracy (Rosenholtz & Simpson, 1984). To self-assess with accuracy, students must first internalize the standards against which their performance should be judged, and as the accuracy of their assessments increases, their perceptions of ability and efficacy increase (Schunk, 1989). Students who understand their personal strengths and weaknesses and the strategies they can use to enhance their performance are better able to plan and monitor their work (van Kraayenoord & Paris). As a result, self-assessment distinguishes itself as including all three domains of self112

PAGE 128

regulated learning: cognitive, motivational, and affective (Paris & Paris, 2001). The National Council of Teachers of Mathematics (1995) also supports the classroom practice of self-assessment in its Assessment Standards for School Mathematics. According to the Assessment Standards, student self-assessment serves to improve students confidence in their ability to do mathematics, and allows them to become more independent learners of mathematics. The accuracy of students self-assessment attempts is measured primarily by comparing their self-assessment scores to their actual performance. Confidence bias is defined as the systematic error of judgment made by individuals when they assess the correctness of their responses to questions relating to intellectual or perceptual problems (Pallier, 2003). The correspondence between subjective probability (i.e., a personal assessment of accuracy) and the actual result provides a measure of calibration with which to assess confidence bias. When an individuals average confidence score exceeds his or her average performance score, the bias score is positive and the person is said to be overconfident. A negative bias score indicates underconfidence. Several theoretical models have been developed to explain the phenomenon of overconfidence and underconfidence, the most prominent of which are the heuristics and biases approach, the ecological approach, and the individual differences perspective. In brief, proponents of the heuristics and biases approach claim that errors in confidence judgments occur due to errors in thinking or procedure (Kahneman & Tversky, 1996). With the ecological 113

PAGE 129

approach, overconfidence results from the misleading nature of the questions on general knowledge tests that provide misleading cues intended to trick students into choosing an incorrect answer. The individual differences perspective, based on the work of Pallier et al. (2002), suggests that the cause of miscalibration is an independent metacognitive trait that mediates the accuracy of self-assessment. Within this framework, individuals tend to express a consistent confident level, irrespective of their accuracy level. Highlighted Research on Self-Assessment Confidence bias was the focus of Palliers (2002) dual research studies. The participants in the studies self-assessed the accuracy of their responses to questions covering a wide range of cognitive and perceptual abilities. As found in other studies, the confidence factor transcended any single facet of cognitive ability, indicating that the subjects possessed a trait that mediated their ability to evaluate the accuracy of their responses. This confidence trait relied on the individuals cognitive ability and on some aspects of personality to a small extent, and was a major determinant of the accuracy of self-assessment. Unfortunately, because no gender data was reported in these studies, it is impossible to draw any conclusions as to how gender contributes to the confidence factor. Although Halpern and LaMay (2000) concluded from a review of current research that no gender differences in general intelligence exist, they acknowledge the existence of gender differences in performance on specific tasks. Concerned that an overall measure of IQ may fail to identify confidence biases that exist on tasks that involve different cognitive abilities, Pallier (2003) 114

PAGE 130

followed up on his original work with a pair of additional studies aimed at uncovering gender differences in the accuracy of self-assessment. The college men and women in the first study had nearly identical accuracy scores, but the confidence and bias scores indicated significant gender differences. Men were more confident in their performance on both tasks tested, resulting in women being better calibrated on the general knowledge task, and men on the visualization task. In short, gender affected the accuracy of self-assessment. The participants in the second study were adults of all ages, and the results were similar to the first study. The men exhibited higher confidence than women, and their confidence levels increased as they aged. As a result, women were better calibrated than men, but the accuracy of all participants self-assessment declined with age. Hassmen & Hunt (1994) did not find any significant gender differences in the accuracy of students self-assessment or in their level of confidence, but found that females benefited from being asked to assess their own work. Men outscored women on the test in both the control and treatment groups, but the score gap decreased by a significant amount in the treatment condition where students were asked to assess their answers. A follow-up study conducted by Koivula, Hassmen, and Hunt (2001) found similar results, but no interaction between treatment and gender. As in the earlier study, men outscored women on the quantitative test, but all students benefited from the condition of self-assessing their test question responses. These findings together support the 115

PAGE 131

notion that self-assessment plays a vital role in self-regulation and is related to improved performance. Wright and Houck (1995) studied high school students to examine possible gender differences in the accuracy of self-estimates of academic ability. On average, males tended to hold higher self-assessments of their verbal and mathematical ability compared to females. However, contrary to other research findings, both males and females underestimated their performance on the verbal and mathematics tests. One interesting finding for males was that their own self-assessment was more highly correlated with their test scores than either the teachers assessment or the students GPAs. The biggest threat to validity in this study involves the measurement of students self-assessment, and may explain why the results were not in line with other research. The students were not asked to self-assess their test performance until two weeks after the tests were administered. It is possible that the time delay resulted in the students self-assessing their general verbal and mathematics ability more so than their performance on the test itself. The time delay may also be responsible for the finding that males underestimated their performance, because other similar studies found the opposite relationship. Ideally, the self-assessment should have either been embedded in the test or administered immediately after the test. In a study of upper elementary aged students, van Kraayenoord and Paris (1997) found that girls were more accurate in providing self-assessments of their work across content areas. Vermeer, Seegers, and Boekaerts (1994) worked with the same age group, but obtained students measures of self-assessment at 116

PAGE 132

three stages of the mathematical problem solving process: orientation, execution, and verification. They found that the gender differences in confidence ratings were dependent on the type of problem being assessed. For an algorithmic problem, girls and boys were equally confident at the orientation and verification stages, but girls were more confident than boys during the execution stage. For word problems, however, boys were more confident than girls at all stages. Among the students who were successful on the word problems, girls showed less confidence in their work than boys until the verification stage where their ratings were similar. Summary of Self-Assessment Most of the self-assessment research summarized in Table 5 points to overconfidence in males and better calibration in females, with both genders showing better performance as a result of assessing their own work (Hassmen & Hunt, 1994; Kiovula et al, 2001; Pallier, 2003; Vermeer et al., 1994; Wright & Houck, 1995). Self-assessment, as a self-regulatory strategy, would appear to benefit learners of all ages (Hassmen & Hunt; Kiovula et al). Because development of self-regulated learners is a goal of education, the literature appears remiss in working to understand the possible relationships among self-assessment, performance, and gender in children under high school age. A majority of the research in this area deals with college-age students (Hassmen & Hunt; Kiovula et al; Pallier et al., 2002; Pallier, 2003), a population that is easy to access for research purposes, but is nearing the end of formal education. Research on this age group, although important for adding to the field of 117

PAGE 133

Table 5 Results of Self-Assessment Studies Author/Date Participants Results Hassmen & Hunt, 1994 Koivula et al., 2001 Pallier et al., 2002 Pallier, 2003 van Kraayenoord & Paris, 1997 Vermeer et al., 1994 Wright & Houck, 1995 120 undergraduate students 574 students: 494 high school, and 80 first-year college Study 1: 520 Air Force recruits. Study 2: 107 undergrad psychology students Study 1: 185 undergrad psychology students. Study 2: 303 adults ages 17-80. 93 students in grades 3, 4, & 5 51 students 11-12 years old 222 students in grades 9-11 Females benefited from being asked to assess the accuracy of their responses. Both males and females benefited from being asked to assess the accuracy of their responses on a quantitative test. Study 1: Confidence is an independent factor, and transcends type of cognitive ability. Study 2: Confidence plays a major role in the accuracy of self-assessment. Study 1: Men assessed themselves higher than women, but gender differences in calibration were task dependent. Study 2: Women were less confident, but more accurate in self-assessment than men. Girls provided more accurate self-assessments than boys, across content areas. In problem solving, boys were more confident of their work than girls at all stages of the process. Males had higher self-assessments than females, but both genders underestimated their performance on the verbal and mathematics tests. knowledge, has less potential to effect changes in pedagogy that may help develop self-regulated learners than does research with younger children. Also, only one of the two studies conducted with elementary aged children was specific to mathematics (Vermeer et al., 1994), and the strong gender differences found in students self-assessment at that age, together with the evidence that 118

PAGE 134

students ability to self-assess their work varies by task, certainly indicates the need for further study. Written Communication in Problem Solving Traditional school mathematics has emphasized the receptive aspects of language; requiring students to attend to the way the teacher and the text communicate information. To empower students mathematically requires that they be invited into the conversation and allowed to express their own conceptions and understandings both verbally and in writing (Miller, 1993). Both written and oral language skills support childrens problem solving processes (Kroll & Halaby, 1997), and an open channel of communication with and between students also promotes reading comprehension (Fuentes, 1998). Students need a multitude of opportunities to communicate about mathematics in order to write about it effectively (Fortescue, 1994), and having students justify their answers, think aloud, and consider different possibilities for solving problems provides these opportunities (Fuentes, 1998). Writing in mathematics has been receiving increased emphasis in recent years due to its perceived impact on learning (Johnson, Jones, Thornton, Langrall, & Rous, 1998). Writing is fast becoming a major component of teaching and learning as well as a way to assess students understanding. It is being successfully used in problem solving instruction through the writing and solving of problems, writing about how to solve a problem, comparing and contrasting 119

PAGE 135

various approaches to solving a problem, and preparing reports or plans for more involved problem situations (Pugalee, 1998). Writing and problem solving are fundamentally linked (Pugalee, 1998). First, writing forces the child to condense inner speech so that it is understandable, structured, and able to be communicated (Vygotsky, 1987). Second, problem solving and writing processes are by their nature recursive rather than linear. In problem solving, as in writing, the student revisits previous decisions to determine how to move forward. Writing during the problem solving process encourages recursive behavior within the planning, evaluating, revising, and editing stages of the composing process (Pugalee, 1998). Challenges and Benefits of Written Communication Writing about mathematical ideas is even more difficult than reading about them. Writing requires that students use language to explore partly-formed understandings and refine their ideas. Though it is difficult, writing about mathematics is an invaluable tool for helping students explore concepts, engage in high-level thinking, and actively construct meaning (MacGregor, 1990). The notion that children learn to write by writing has been adopted by many in the field of mathematics and taken to mean that children will learn to write mathematics by writing mathematics. However, without the proper guidance, modeling, and scaffolding, the result of requiring students to write about their problem solving will likely be trivial responses that lack mathematical substance (Pengelly, 1990). When students are properly supported and routinely required 120

PAGE 136

to communicate their mathematical understanding in writing, they become more proficient in doing so (Miller, 1993). One of the benefits of using writing during problem solving is that when students write about all the steps, procedures, and processes involved in solving a problem, they are more able to understand the steps than when they do not write about their problem solving. The result is a greater likelihood that the student will reflect on, evaluate, and modify each of the steps (Kenyon, 1989; Kroll & Halaby, 1997). Writing also helps students become actively involved in the problem solving process, providing a space for them to reason and understand the problem situation as well as clarify and justify their approach (Goldsby & Cozza, 2002; Pugalee, 1998). It helps students focus on their understanding of the nature of problems and what approaches are possible rather than focusing on algorithms and facts. Finally, writing provides an avenue for bringing students everyday world and problem solving together in a meaningful context (Hildebrand, Ludeman, & Mullin, 1999), a connection encouraged by NCTM (2000). Improved conceptual understanding is another perceived benefit of using writing to communicate about mathematical processes (Quinn & Wilson, 1997). Writing about problem solving helps the writer integrate the mathematical concepts into his/her own thought processes (John-Steiner, 1989). Writing involves processes necessary to mathematics learning that otherwise may not be engaged (Johnson et al., 1998), and helps students create meaning in problem solving (Pugalee, 1997). Writing about mathematical concepts or solutions to 121

PAGE 137

problems allows students to refine their thinking about important mathematical concepts through the critical examination of concepts that are required during the writing process (Woodward, Monroe, & Baxter, 2001). The writing process allows students to communicate, clarify, organize, formalize and reflect on their mathematical thinking, increasing their recall and helping them gain a better understanding of mathematical relationships (Burns, 1995; Drake & Amspaugh, 1994; Fortescue, 1994; Johnson et al., 1998). Metacognition in Writing When students are required to write about a problem they are forced to clarify their thoughts (NCTM, 2000), making the writing process an integral part of the thinking process (Kenyon, 1989). Students engaged in writing about their problem solving processes must critically reflect on their thought processes from the beginning to the end, helping them clarify their thoughts and reflect on their ideas and what they have learned (Kroll & Halaby, 1997; Krulik & Rudnick, 1994; Powell, 1997). The process of writing aids students in the development of their thinking skills in mathematics, helping them to see reflection and synthesis as a normal part of the problem solving process (Muth, 1997; Pugalee, 1997). While students work through the writing process, they are practicing the very behaviors that produce independent learners who take responsibility for their own learning, namely monitoring and reflecting (Fuentes, 1998; Pugalee, 1998). By making students more aware of their own thought processes, writing promotes a clear understanding of conceptual relationships and an enhanced personal ownership of knowledge (Shepard, 1993). The ability of the writing 122

PAGE 138

process to place the learner at the center of his/her own learning makes writing a powerful facilitator of learning anything that involves language (Powell, 1997). Through writing, students improve their reasoning and problem solving abilities while learning to analyze and reflect on their own thinking (Ciochine & Polivka, 1997; Hartman, 1994). Writing enables students to reframe information in their own words, deepen their understanding of a problem situation, and assist them in ordering their thoughts (Cook, 1995), conveying their own understanding, and analyzing their own errors (Hartman, 1994). Writing as Assessment Data Writing makes childrens thinking visible (Hartman, 1994), providing the teacher with a window into the students understanding of the process (Cook, 1995). However, caution must be taken when using students work to assess understanding because written computational work often does more to show what the child cannot do than what she/he can do (LeBlanc, Proudfit, & Putt, 1980). To assess students conceptual understanding, the teacher must have access to what students were thinking as they worked through the problem. Asking students to explain how they know their answer is correct will elicit much more linguistic information about students thinking than simply asking students to show their work. When properly facilitated, childrens writing in problem solving can also reveal misconceptions and gaps in understanding that may not be exposed through traditional assessments (Krulik & Rudnick, 1994; MacGregor, 1990). When students are only required to produce correct answers, teachers have no 123

PAGE 139

way to differentiate between a student who happens on the correct answer without understanding the problem and a student who fully understands the problem but simply made a computational error (Goldsby & Cozza, 2002). Some students can produce accurate computational work, but are not able to articulate their processes in writing because they do not truly understand but are merely mimicking the process demonstrated by the teacher (Dusterhoff, 1995). When students are required to justify their processes, their writing can help teachers discern their level of understanding as well as the reasoning behind the mistakes they made during the problem solving process (Drake & Amspaugh, 1994; Powell, 1997). Teachers who are equipped with this level of assessment about their students understanding will be prepared to make informed decisions when planning subsequent instruction (Burns, 1995; Dusterhoff, 1995). When using students writing about problem solving for assessment purposes, the key items to evaluate are how well the student understood the problem; if the student used productive strategies to solve the problem; if the solution was verified; and whether the composition was organized (Ciochine & Polivka, 1997). As much as writing in the mathematics classroom promises a multitude of benefits, it can also present monumental challenges to students with learning disabilities (Woodward, Monroe, & Baxter, 2001) and for whom English is not their native language (Kang & Pham, 1995). Care must be taken when using student writing for the assessment and evaluation of students with writing disabilities. These students are often unable to communicate their thought process in writing, have trouble using written language as an organizational tool, 124

PAGE 140

and may avoid writing altogether with the result that their teachers view them as less capable than they really are (Gurganus & Del Mastro, 1998; Peressini & Bassett, 1996). Highlighted Research on Writing and Problem Solving The literature is replete with articles discussing the benefits of writing, methods for incorporating writing in the mathematics classroom, and encouragement for mathematics teachers to begin making writing a key element of their instructional practice. However, due to the relative dearth of research on writing in problem solving (Johnson et al., 1998), the burden is on teachers to draw their own conclusions based on their practice. The few studies that have been conducted do provide support for the integration of writing into mathematical problem solving. In one such study, Pugalee (2001) employed qualitative methods to determine the extent to which students linguistic descriptions of their problem solving methods show evidence of metacognitive behaviors. An analysis of the childrens writing revealed obvious metacognitive frameworks in the orientation, execution, and verification phases of problem solving. Johnson et al. (1998) studied both writing to learn and learning to write in the context of probability. Using multiple measures, the researchers analyzed the changes in the level of students thinking and writing about problem situations involving probability that were brought about by the intervention of ten 45-minute writing sessions held over five weeks. During the sessions, students responded to a problem prompt in their journals, discussed the problem as a class, worked 125

PAGE 141

in groups to solve the problem, then shared their thinking back in the whole-class format. The researchers found a significant positive change in students probabilistic thinking and cognitive level in writing from pre-test to post-test. Further analysis showed that the students who made gains in both the content of probability and their writing proficiency benefited from having to write justifications for their solutions and explanations for their drawings. Although the sample sizes in the Pugalee (2001) and Johnson et al. (1998) studies were small, the results do support reform efforts to increase writing in mathematics and raise questions about the potential for writing to serve as a support for the metacognitive behaviors considered important for problem solving. The results from Rudnitsky, Etheredge, Freeman, and Gilberts (1995), large-scale writing study indicate that students in the treatment group who engaged in writing their own problems performed better on the posttest than either the control group or the group that strictly did problem solving without problem writing. In addition, their overall superiority actually increased on the retention test. Noticeably absent from the literature are any studies that examine how including linguistic descriptions of problem solving processes as a scoring criterion in high-stakes assessment differentially impacts students. Also, does requiring linguistic explanation truly differentiate between students who chance upon a correct answer and those who understand the mathematical process, or is it more a measure of writing ability? If the former is true, then the validity of the measure is questionable. The former is certainly the goal, but if the latter is the 126

PAGE 142

case, then holistic rubric scores that include linguistic explanation as a criterion may be misleading. Gender Differences in Writing Mathematical reform efforts and research evidence point to the need for the integration of writing into mathematical problem solving and assessment. As changes are made, especially those with high-stakes ramifications, the possibility that gender differences in writing may preference some students over others must be considered. Several studies point to a distinct female advantage in writing that is consistent across grade levels. Gambell and Hunter (1999) found that male students in elementary, middle, and high school were on average significantly weaker in all writing skills compared to females. In addition, female students had more positive attitudes toward writing. The issue of attitude is an important one. According to Banduras (1986) social cognitive theory, childrens judgments of what they can accomplish are important arbiters of their academic accomplishment. This is, in part, because these judgments of confidence, or self-efficacy beliefs, are said to act as mediators between other influences such as skill and previous performance that children bring to an activity, and their subsequent performance. Knudsons (1995) study bears this out in her finding that elementary students who had a more positive attitude toward writing were more likely to be above-average writers. She also found a significant female advantage in writing ability. Pajares, Miller, and Johnson (1999) also found that the self-efficacy beliefs of upper elementary school students were good 127

PAGE 143

predictors of their writing performance, even when writing aptitude was controlled. Although there were no significant gender differences in self-efficacy beliefs after controlling for writing aptitude, girls writing performance exceeded that of boys. Gabrielson, Gordon, and Engelhard (1995) found that gender had a significant effect on high school students writing content, style, conventions, and sentence formation, with girls outscoring boys on all measures. Gormleys (1993) qualitative study of students journal writing found gender differences in the style of writing among sixth grade students, though the study did not seek to make any judgments as to the superiority of different styles. Malecki and Jewell (2003) found that girls outscored boys across grade levels on all production-independent indices (total words written, words spelled correctly, and correct writing sequences), as well as the production-independent indices (percent of words spelled correctly, and percent of correct writing sequences). This finding indicates that girls may have an advantage over boys in assessments that use holistic scoring since they tend to write more text, more accurately, than boys. In a study of childrens perceptions of boys and girls writing competencies, Peterson (2000) found that grade four students did not favor one gender over the other, but eighth graders perceived girls to be better writers. The teachers involved in the study characterized girls as more competent writers than boys at both grade levels. The results regarding teachers mirror those found in a previous study (Peterson, 1998), with teachers characterizing girls writing as more sophisticated, detailed, organized, and developed. 128

PAGE 144

Peterson and Bainbridge (1999) also studied teacher perceptions and gendered expectations and found that teachers attempted to ignore the influence of gender perceptions in their blind assessment of students narrative writing. Despite their efforts to ignore gender, they still constructed the writers gender while reading the narratives, allowing for the perpetuation of gender stereotypes and the inequalities they create. Summary of Writing The results of the research reviewed in this section are summarized in Table 6. In brief, the practice of having students write their own problems and justifications for their answers to problems appears to improve their problem solving proficiency (Johnson et al., 1998; Pugalee, 2001; Rudnitsky et al., 1995). Girls and boys differ in their writing styles (Gormley, 1993), with girls scoring higher than boys on measures of writing skill (Gabrielson et al., 1995; Gambell & Hunter, 1999; Knudson, 1995; Malecki & Jewell, 2003; Pajares et al., 1999). Also, both teachers and students have fairly consistent perceptions about girls superior writing ability (Peterson, 1998; Peterson, 2000; Peterson & Bainbridge, 1999). With the expanding role of writing in problem solving, and the advent of holistic scoring rubrics for high-stakes assessments that include linguistic explanation of problem solving processes as a criterion, these findings related to gender are cause for concern and may signal the future reversal of boys current superiority over girls in high-level mathematics. 129

PAGE 145

Table 6 Results of Writing Studies Grouped by Topic Author/Date Participants Results Johnson et al., 1998 Pugalee, 2001 Rudnitsky et al, 1995 Gabrielson et al., 1995 Gambell & Hunter, 1999 Gormley, 1993 Knudson, 1995 Malecki & Jewell, 2003 Pajares et al., 1999 Peterson, 1998 Peterson, 2000 Peterson & Bainbridge, 1999 24 5 th grade students 20 9 th graders 401 3 rd graders 34,200 11 th grade students 3,214 students in grades 5, 8, & 11 36 sixth graders 430 students in grades K-6 946 1 st 8 th graders 363 3 rd 4 th & 5 th graders 174 6 th grade teachers 386 4 th and 8 th grade students, and their 12 teachers 96 teachers Students benefited mathematically from having to write justifications for their answers. Childrens writing about problem solving showed evidence of metacognitive behaviors. Students who engaged in writing their own problems performed better on a problem solving test. Gender had a significant effect on all measures of writing ability with girls outscoring boys. Females outscored males in all writing skills and had more positive attitudes towards writing. Girls and boys writing styles are distinctly different. Girls outscored boys in writing, and students with better attitudes towards writing were more likely to be above-average writers. Girls outscored boys on all measures of fluency and accuracy. Self-efficacy beliefs about writing were good predictors of writing performance, even when writing aptitude was controlled. Girls writing performance outscored boys. Teachers judged girls writing as superior to boys in detail, organization, and development. Fourth graders do not perceive either boys or girls to be better writers, but eighth graders perceive girls to be better writers. Teachers see girls as better writers than boys. Teachers were unsuccessful in ignoring gender perceptions when grading narrative writing pieces. 130

PAGE 146

How the Review of Literature Informed this Study From the review of research on the subject of gender differences in mathematics, it becomes clear that although studies have not consistently detected significant differences in students achievement at all grade levels, there are complex differences in how boys and girls respond cognitively and affectively to different types of questions and use different strategies to arrive at their solutions. This research sought to add to the body of knowledge about how boys and girls choice of strategies to solve mathematical problems differs. There is extensive research that verifies the connection between childrens reading comprehension and mathematical performance, but a majority of that research focuses on children with reading disabilities and how those disabilities can best be accommodated in testing situations. Because of the current focus on problem solving as the heart of all mathematics instruction, and also because problem solving is dependent on textual communication, further research that addresses the link between reading ability and problem-solving performance is still needed. Gender differences in reading ability complicate this relationship and were a topic of interest in this research. Self-regulated learning is linked to strong performance in mathematics, and self-efficacy, as an element of self-regulation, mediates the relationship between students attitudes about mathematics and their mathematical achievement across age ranges. Calibration, a measure of the accuracy of students self-efficacy beliefs, is an efficient means of observing whether 131

PAGE 147

students level of efficacy exceeds or falls short of their actual performance and was a variable of interest in this study. Research into self-assessment shows that regardless of the accuracy of students self-assessments, the simple act of assessing their own work improved their performance. Self-assessment has not yet gained much attention in the literature, but this finding, coupled with other findings that indicate a tendency for males to be more confident than females in their self-assessments, points to a potentially fruitful area for research. This research examined gender differences in the accuracy of childrens self-assessments on a mathematical problem-solving test. As problem solving is becoming more prevalent in high-stakes testing, and students are required more often to provide linguistic explanations of their problem-solving processes, writing in the context of problem solving becomes an increasingly important area for research. Writing, as a general area of research, has historically received scant attention. Moreover, writing as it relates to mathematics has received even less attention. Those who have journeyed into the field found that students benefited from writing their own problems and writing justifications for their answers. This research sought to examine how boys and girls differentially respond to the invitation to provide linguistic explanations of their problem-solving processes. 132

PAGE 148

CHAPTER 3 METHODS The purpose of this study was to examine how gender, reading ability, and mathematics ability differentially moderated students mathematical problem-solving processes, linguistic explanations of those solution processes, achievement on a mathematical problem-solving test, self-efficacy on a self-reported rating scale, and self-assessment on a self-reported rating scale. This chapter contains five sections. The first section describes the design of the study. The second section describes the population and sample selection as well as group categorization for the study. The third section includes discussion of the development and validity of the testing instrument. The fourth section discusses the measures taken to ensure the reliability of the data. The fifth section provides specific details concerning data collection. The final section explains the manner in which the data was analyzed and interpreted. Design The intent of this study was to address the following research questions: 1) To what extent are students choices of mathematical problem-solving processes related to gender? 133

PAGE 149

2) To what extent is childrens performance in mathematical problem solving related to gender, reading ability as measured by teacher ratings and Florida Comprehensive Assessment Test (FCAT) reading scores, and mathematics ability as measured by teacher ratings and FCAT mathematics scores? 3) To what extent is the quality of students linguistic explanations of their problem-solving processes, as measured by a holistic rubric, related to gender, reading ability as measured by teacher ratings and FCAT reading scores, and mathematics ability as measured by teacher ratings and FCAT mathematics scores? 4) To what extent is the relationship between students feelings of self-efficacy when commencing work on a mathematical problem, as measured by a self-reported rating scale, and their actual performance on a mathematical problem solving test as measured by a holistic scoring rubric related to gender, reading ability as measured by teacher ratings and FCAT reading scores, and mathematics ability as measured by teacher ratings and FCAT mathematics scores? 5) To what extent is students ability to assess their own performance on mathematical problem solving tasks as measured by a self-reported rating scale related to gender, reading ability as measured by teacher ratings and FCAT reading scores, and mathematics ability as measured by teacher ratings and FCAT mathematics scores? 134

PAGE 150

The design of this non-experimental causal-comparative study includes both quantitative and qualitative methods, depending on the question being analyzed. The first research question dealt with the different processes that students chose to solve problems, and was analyzed using qualitative methods coupled with descriptive statistics and chi-square tests. The qualitative analysis was performed using a stance of objectivist grounded theory, allowing the students responses to define the categories used in the analysis. Grounded theory provided the researcher with a framework for coding data to provide some standardization and rigor (Patton, 2002). Objectivist grounded theory: Accepts the positivistic assumption of an external world that can be described, analyzed, explained, and predicated: truth, but with a small tIt assumes that different observers will discover this world and describe it in similar ways (Charmaz, 2000, p. 524). The remaining four questions were analyzed with quantitative methods using descriptive statistics and a series of factorial ANOVAs coupled with multiple regression. The three independent variables for the factorial ANOVAs were gender (male or female), reading ability (high, middle, or low), and mathematics ability (high, middle, or low). The three independent variables for the multiple regression tests were gender (male or female), reading ability (using the students FCAT reading scale scores), and mathematics ability (using the students FCAT mathematics scale scores). The five dependent variables were as follows: 135

PAGE 151

1) Problem-Solving Process Themes: a categorical score for each of the twelve test items that identifies the overarching theme of the process used to solve the problem. 2) Mathematical Performance: a total test performance score as assessed with the FCAT four-point holistic rubric (possible scores range from 0 to 48). 3) Linguistic Explanations: a total score for the quality of students linguistic explanations rated using a three-point rubric (possible scores range from 0-36). 4) Self-efficacy: a measure of mean bias for self-efficacy (found by the average difference between students self-reported ratings of self-efficacy and their performance scores). Students self-reported self-efficacy responses on the five-face scale were converted to a numerical score ranging from zero for the saddest face to four for the happiest face. 5) Self-Assessment: a measure of mean bias for self-assessment (found by the average difference between students self-reported scores of self-assessment and their performance scores). Students self-reported self-assessment responses on the five-face scale were converted to a numerical value ranging from zero for the saddest face to four for the happiest face. 136

PAGE 152

Population and Sample Selection The school district from which the sample was drawn encompasses the entire west central Florida county that has a population of over 270,000. The sample was limited to the east side of the county in an effort to obtain a fairly homogenous sample of students who were relatively successful academically. Due to the number of variables of interest already in the study, the researcher did not wish to add an additional variable of the socio-economic status of the schools. Also, because the ability of the researcher to perform in-depth analysis on student work was dependent on the students ability to produce work that could be analyzed, the researcher chose to limit the study schools to those with a history of high performance on the FCAT test. The study county is experiencing enormous growth, primarily in the previously undeveloped east side, with the result that the socio-economic status of most east county residents is upper-middle class. In the older west county areas, very affluent areas abut low-income areas resulting in a wide breadth of backgrounds in each schools student population. Although the percent of students receiving free or reduced lunch ranges from 4% to 95% county-wide, all three schools in the study sample provide free or reduced lunch to 10% or less of their student bodies. The percentage of students with Limited English Proficiency at the three schools ranges from 3.5% to 5.1% (County School Board, 2004). All three schools have an A grade in the state of Florida, and ranked in the top three for the county in both reading and mathematics for two of the three grades tested in the 2003 administration of the Florida Comprehensive Assessment Test (FCAT Results, 137

PAGE 153

2004). School grades range from A to F and are determined based on how well students are doing, how much progress they have made, and how much progress struggling readers are making. All of these areas are measured via the FCAT reading, writing, and mathematics sections (State of Florida Department of Education, 2005). All schools in the county use mathematics textbooks from the State Adopted Mathematics Textbook List that have also been approved by the countys school district and address problem solving in keeping with the guidelines of the Sunshine State Standards (State of Florida Department of Education, 1996). For the year of the study, schools 1 and 2 used Houghton Mifflins curriculum, and school 3 used McMillan McGraw curriculum. In these elementary schools, mathematics is generally taught one hour per day, and teachers are provided with a curriculum guide as well as a textbook to structure their instruction. The decision to test fifth-grade students was made for a variety of reasons. The goal was to study the youngest students possible who would likely provide the data desired. Because one of the research questions was focused on analyzing students linguistic explanations of their problem solving processes, primary-grade students were eliminated from consideration as their writing skills are not well enough developed. In addition, the simplicity of the word problems that would be appropriate for primary-age students would not likely result in the rich and varied approaches to problem solving the researcher hoped to obtain for analysis. Much work has been done with young students in the area of gender 138

PAGE 154

differences in problem solving processes through collection of observational data (Carr et al., 1999; Carr & Davis, 2001; Fennema et al., 1998), but upper elementary students have been largely overlooked as participants in these types of studies. Students in middle and high school were not of particular interest because much work has already been done regarding gender differences in achievement and problem solving (Ai, 2002; Ethington, 1992; Fierros, 1999; Gallagher et al., 2000; Lane, Want, & Magone, 1996; Pomplun & Capps, 1999; Webster & Henriksson, 2000; Zambo & Follman, 1994). Also, because the emphasis of this proposed research is more on understanding the underlying development of gender differences rather than differences in performance that result from developmental causes, younger students were a more appropriate population to study. Gender differences in achievement do not often appear until the middle to high school years, but the roots of those differences must be present in the elementary grades. Although most previous research on self-efficacy and self-assessment used middle school through college students, research indicates that upper-elementary students are capable of differentiating their competence across various activity domains (Eccles, Wigfield, Harold, & Blumfield, 1993; Stipek & Gralinski, 1991). By studying fifth-grade students, the data they provided was sufficient for analyzing gender differences in their problem solving processes and linguistic explanations, leading to a better understanding of the formative stages of differences that will later moderate overall achievement. 139

PAGE 155

The general rule in quantitative study is to use the largest sample possible, particularly when using a subgroup analysis. Statistical power, the probability of detecting a significant effect when the effect really does exist in nature, is heavily dependent on sample size (Stevens, 1999). The effect sizes likely to be found in a majority of social sciences research are small (.20) to medium (.50)(Cohen, 1977). At a .05 level of significance (), with a .30 effect size, and eighteen groups (2 levels of gender x 3 levels of reading ability x 3 levels of math ability), the total sample size needed to achieve power of .80 is 468. With an estimated .50 effect size, the sample size needed to achieve power of .80 drops to 189. The three schools that were included in the study had a total fifth-grade population of 396 students. To have adequate power to detect any significant gender differences that existed, all effort was made to ensure a high percentage of participation. If an insufficient number of students parents allowed their children to participate, the study would have been underpowered according to the a priori power analysis, and if no significant effects were obtained, then the possibility would have existed that effects might truly exist in nature, but were not detected because of the low number of participants. The independent variables included in this study were gender, reading ability, and mathematics ability. Gender data was gathered during test implementation based on students self-report on the test cover page. Each students reading and mathematics ability were categorized according to the following procedure. At the time of test administration, each classroom teacher was provided with a form (Appendix G) on which to rank each participating 140

PAGE 156

students reading and mathematics ability as either high, middle, or low, based on professional judgment. Research evidence shows that teacher judgments of student ability are reliable, accurate, and free of gender bias (Desoete, Roeyers, & Buysse, 2001; Feinberg & Shapiro, 2003; Helwig, Anderson, & Tindal, 2001). However, since the study was conducted in the first half of the school year, the teachers were at a disadvantage in assessing their students. For that reason, students prior years mathematics and reading level scores on the Florida Comprehensive Assessment Test (FCAT) were included as criteria in the classification procedure. For the FCAT test, a score of three, four, or five is considered to be indicative of meeting adequate yearly progress. For this reason, all students who scored a three or above on the FCAT were categorized as either high or middle for this study. A score of 5 indicates very high performance; therefore all students with FCAT scores of 5 were categorized as high for this study, regardless of teacher rating. Students who scored a 4 on the FCAT were categorized as high only if they received a teacher rating of high. Students with middle or low teacher ratings and an FCAT score of 4, together with all students with an FCAT score of 3 were categorized as middle. A level score of 1 or 2 indicates that the student had not met the grade level expectations of the Sunshine State Standards (State of Florida Department of Education, 1996) and was in danger of failing. There are rare instances when, due to illness or various stresses, a competent student will receive an unsatisfactory FCAT score. A teacher rating of high for a student with a 1 or 2 FCAT score is an indicator of 141

PAGE 157

this type of situation. For this reason, a student who received an FCAT score of 1 or 2 and a teacher rating of high in a content area was categorized as middle for the purposes of this study. Students who scored 1 or 2 and received a middle or low teacher rating were categorized as low for the study. For a complete explanation of the five FCAT achievement levels, refer to Appendix E. Table 7 illustrates how the categorization criteria were applied. Table 7 Categorization of Student Ability FCAT score Teacher rating 5 4 3 2 1 High High High Middle Middle Middle Middle High Middle Middle Low Low Low High Middle Middle Low Low The researcher acknowledges that this categorization process has its limitations and may force students who vary by minor differences into separate ability categories. The process does, however, allow for statistical testing that compares groups of students, allowing for conclusions to be drawn and discussed in language that will hopefully benefit teachers and teacher educators. To offset the limitations of this categorization procedure, the data were also examined continuously through multiple regression. Students gender, along with 142

PAGE 158

their FCAT scale scores (ranging from 100-500) for mathematics and reading, were used as the independent variables in further analysis of the data for questions two through five. Statistical Analysis Software (SAS) was used to perform all of the statistical tests. Participant Demographics Participation Levels by School and Gender The population that this study sampled consisted of 396 fifth-grade students from three schools in a west-central Florida school district. All students in the sixteen fifth-grade classes in the three schools were invited to participate. Of the total population, 286 students, composed of 163 boys (71% of the boys in the population) and 123 girls (73% of the girls in the population), elected to participate in the study, and obtained parental permission to do so. All of the 286 students who obtained parental permission were tested using the test instrument developed for the study (Appendix J). Teacher ratings of the students mathematics and reading ability were obtained for all participating students. As indicated by the teachers, only 3 of the 286 students were classified as ESOL (English for Speakers of Other Languages), signifying that language issues did not play a significant role in the study. None of the ESOL students requested or received any accommodations during testing. Table 8 shows the participation levels of boys and girls by school. The data show that although more boys than girls elected to participate in the study, this is primarily due to the fact that there were more boys than girls in the study 143

PAGE 159

schools. School 3 had the highest participation rate of the three schools at 77.6%, followed by school 1 at 71.2% then school 2 at 67.6%. School 2 is the newest of the three schools and experiences the greatest rate of influx of students new to the area. Table 8 Participation Levels of Boys and Girls by School Boys Girls Boys and Girls Total Participants Total Participants Total Participants School n n % n n % n n % 1 67 49 73.1 43 30 69.8 111 79 71.2 2 77 46 59.7 65 50 76.9 142 96 67.6 3 84 68 81.0 60 43 71.7 143 111 77.6 Total 228 163 71.5 168 123 73.2 396 286 72.2 Treatment of the Data Due to the scheduled changing of classes in one school, two students did not finish the test. Since this represented an anomaly that would likely result in outlying scores, these students were eliminated from the study. Complete prior years FCAT reading and mathematics scores could not be obtained for 45 of the remaining 284 students. All students were tested for this study between the second week of November and the first week of December, and requests for test 144

PAGE 160

scores were made within one week of the completion of testing. Complete FCAT scores were unavailable for three groups of students: those who had not attended a Florida school for fourth grade (no scores were available), those who moved away between the time they were tested and the test scores were requested, and those who attended a different Florida school in fourth grade. For this last group of students, their cumulative folders only contained FCAT level scores and Norm Reference Test scores. The scale scores that were used as an independent variable for all regression analyses were not reported. In order to be considered for inclusion in the analysis, each student had to finish the test, be rated for mathematics and reading ability by her/his teacher, and have FCAT scores for reading and mathematics on record with the school. Only 239 of the 286 participants (131 boys and 108 girls) met all three criteria. The next step in finalizing the sample was to identify all students whose scores were outliers. For the purposes of the ANOVA analysis, an outlier was defined as a student whose score on any dependent variable was more than 2.7 standard deviations away from the group mean. In a normal distribution, less than 4 in 1,000 meet this criteria for outliers (Glass & Hopkins, 1996). A review of the box and whisker plots and descriptive statistics for the sample groups revealed a total of six instances of student scores that met the criteria for outliers for the dependent variables of mathematics performance, linguistic explanation, self-efficacy mean bias, and self-assessment mean bias. Further analysis found that the six outlying scores were attributable to only three students. Student 1 was a boy in the low reading ability group and the 145

PAGE 161

middle math ability group. His scores were outliers for both total mathematics performance and self-efficacy mean bias. Student 2 was a boy in the high ability group for both mathematics and reading. A review of his raw data indicated that he marked the saddest face for self-efficacy and self-assessment for every test item, yet he scored relatively well on all questions. As a result, his self-efficacy and self-assessment mean bias scores were extremely low. Student 3 was the only student among the 286 participants who was categorized in the high mathematics group and the low reading group. His mathematics performance and linguistic explanation scores were extremely high among the low readers, resulting in his scores being outliers for both variables. Because these three students scores were anomalous and would not contribute to an understanding of the population as a whole, their inclusion in the data analysis could lead to spurious results. For that reason, they were eliminated from the analysis. One of the three outliers was already eliminated from inclusion in the analyses due to missing FCAT scores. When the remaining two outlying students were removed from the multiple regression analysis, no other students scores approached a Cooks D value of 1. Cooks D is a measure of how much the regression coefficients would change if a participant was eliminated, and any value greater than one is considered large (Stevens, 1986). This finding confirms the removal of only the outliers revealed by the descriptive statistics. There were three instances of missing data for the students, two for self-efficacy and one for self-assessment. For all three cases, the missing data were estimated using the following process. The first step was to identify all items for 146

PAGE 162

which the student achieved the same mathematics rubric score as the item that had missing data. The average of the students self-efficacy or self-assessment scores for all the identified items was used to estimate the value of the missing score. The only instance of missing data for the teachers was in the ratings of their students preparedness for successfully completing each item. Due to a copying error, one teachers rating sheets only contained the first seven of the twelve test questions. As a result, no ratings were obtained at the time of test administration for the last five test items for this teacher. The purpose of acquiring the data at the time of test administration was to ensure that the teachers assessment of their students preparedness could be matched with their students performance. Due to the time lapse between the test administration and the discovery of the missing data, the researcher determined that omitting the data from the analysis would be a better course of action than acquiring new data that may not be trustworthy. Final Sample Demographics The final sample for the analyses consisted of 237 students (108 girls and 129 boys). A decision was made to use listwise deletion to ensure that the sample size was the same for the entire study so that conclusions could be drawn across research questions and analyses. Of the 284 students who completed the test, a total of 47 students were eliminated from the analysis, 45 for missing data, and 2 for being outliers. More boys than girls were deleted from the sample, but there were more boys than girls in the study schools and in the 147

PAGE 163

original participant pool. The mean scores for performance, linguistic explanations, self-efficacy, and self-assessment for the eliminated students were compared with the scores for the remaining sample. The students who were omitted from the study scored an average of 28.99 points for performance, compared to 26.76 points for the remaining sample. The mean linguistic explanation score for the deleted students was 16.01, compared to 14.2 for the final sample. The deleted students were slightly less self-efficacious with a mean total score of 33.68, compared to the sample mean of 35.73. However, the deleted students assessed their work higher (M=37.39) than the remaining sample (M=36.97). Cohens d (1977) is an effect size measure used to compare the differences in the means of two groups in standard deviation units. The rough guidelines for interpreting the effect size are that a measure of 0.2 is small, a 0.5 effect size is medium and would be apparent to the researcher, and an effect size of 0.8 is large. The effect sizes for the differences in the means between the final study sample and the students omitted from the study were 0.31, 0.25, 0.27, and 0.06 for performance, linguistic explanations, total self-efficacy, and total self-assessment, respectively. All of these effect sizes are in the small to medium range, and do not pose a threat to the validity of this study. As predicted from the categorization of the small sample in the pilot study, the cell sizes were unbalanced. Not surprisingly, no students in the final sample were categorized as having high ability in one area and low ability in the other. A total of 170 students (72%) in the sample fell in the diagonals representing identical categorization for both subject areas. Table 9 illustrates the fact that 148

PAGE 164

there is a disproportionately high number of boys in the high/high cell, and more girls than boys in the low/low cell. Also of note is the fact that only two girls were categorized higher in mathematics than in reading, whereas thirteen boys fell into that category. Conversely, more girls (28) than boys (24) were categorized in a higher reading level than mathematics level. Table 9 Cell Distribution of Sample Students Mathematics Ability High Middle Low Reading Ability n % n % n % Girls (n=108) High 17 15.7 13 12.0 0 0.0 Middle 2 1.9 46 42.6 15 13.9 Low 0 0.0 0 0.0 15 13.9 Boys (n=129) High 31 24.0 12 9.3 0 0.0 Middle 7 5.4 52 40.3 12 9.3 Low 0 0.0 6 4.7 9 7.0 Note. Percentages were computed according to gender. Table 10 shows the mean mathematics and reading FCAT scale scores and levels for the sample. The table presents the data grouped by school, by gender, and for the total sample. The data indicate that the students from school 149

PAGE 165

three had the highest scores for both mathematics and reading, and according to Table 10 this is the same school that had the highest percentage of participation. Table 10 Mean FCAT Scores for Sample Grouped by School and by Gender Mathematics Reading Group Scale Level Scale Level By school School 1 331.3 3.18 342.8 3.50 School 2 323.1 2.97 340.3 3.45 School 3 337.5 3.29 347.8 3.65 By gender Boys 338.6 3.33 342.4 3.52 Girls 322.1 2.94 345.8 3.56 Total 331.1 3.16 344.0 3.54 Note. FCAT scale scores have a range of 200-500. FCAT level scores range from 1 to 5. N=237 total, 129 boys, 108 girls, 68 students from school 1, 76 from school 2, and 93 from school 3. School 2 students scored the lowest among the three schools and this school had the lowest percentage of participation. The boys in the sample had higher mathematics scores but lower reading scores than the girls, mirroring the distribution of the sample in the ability-level cells. 150

PAGE 166

Instrumentation Description of the Testing Instrument The test that was administered to the participants included 12 constructed-response mathematical problem-solving items. A copy of the test is located in Appendix H. The student work space shown in the appendix examples is smaller than the work space on the actual test. In the appendix, the size of the work space was reduced in order to accommodate the appendix heading. The cover page included spaces for students to enter their name, teacher, school, and gender. After the test was administered, each student was assigned an alpha-numeric code that was entered on the cover page and at the top of the first test page. At that time, the cover page was removed and stored separately from the test documents so that each students identity was protected throughout the data analysis process. The test was formatted such that only one question appeared on each page. In the header area of each page is a small box labeled DO NOT WRITE IN THIS AREA that was used to record students alpha-numeric identification code together with their scores on that question. The problem statement was presented at the top of the page, followed by a box designed to obtain students self-reported measure of self-efficacy. After the work space at the bottom of the page was another box designed to obtain students self-assessment score. Measuring Self-Efficacy and Self-Assessment The self-efficacy directions state, Before you begin working, circle the face that best shows how well you think you can solve this problem. The 151

PAGE 167

instructions were followed by a series of five faces ranging from very happy to very sad. The self-assessment directions stated, When you are done working, circle the face that best shows how well you think you did on this problem. These instructions were followed by the same series of five faces. The number of faces used in these measurements was determined through both a review of similar research and results of the pilot study. Boekarts, Seegers, and Vermeer (1995), and Vermeer, Boekaerts, and Seegers (2000) are two studies that measured students confidence during the process of solving individual problems. Their testing was conducted via computer, and students were prompted to select one of five faces to rate their confidence level after reading the problem (self-efficacy), at 40-second intervals while solving the problem, and when they were finished solving the problem (self-assessment). By duplicating the method they used to measure self-efficacy and self-assessment, the results of this research add to their findings. Only four faces were used in the pilot study in an effort to keep students from over-selecting a neutral rating. However, follow-up interviews conducted with a small sample of the pilot students revealed some ambiguity in selection of the slightly happy face versus the slightly sad face. Although students explanations for their selection of the happiest and saddest faces were clear and consistent, their explanations for selection of the two middle faces yielded similar and overlapping descriptions. For example, students described their selection of the slightly happy face for the self-efficacy question, as I wasnt sure that I was going to do well or not and I might do well on it, might not. For the slightly sad 152

PAGE 168

face, some of the responses were I thought that I wasnt going to do that well but I had a chance to get it right, and Not too well but not sure. Similar responses were given for the two middle faces with the self-assessment scoring. Adding a fifth face in the middle position with a neutral, straight-line mouth, increased the likelihood that students would view the slightly happy face as representing a primarily positive response, and the slightly sad face as representing a primarily negative response. Development of the Test The sources for potential test items were the fourth-grade released items from the 1993, 1996, and 2003 administrations of the National Assessment of Educational Progress (National Center for Education Statistics [NAEP], 2003), and the fifth grade released items from the 2001 FCAT (State of Florida DOE, 2001). Questions from earlier versions of the NAEP test were not considered because they were written before publication of the Curriculum and Evaluation Standards for School Mathematics (NCTM, 1989), and may or may not reflect evidence of mathematics reform. The goal was to select problems from existing released items of tests that had undergone extensive piloting, reliability and validity checks to enhance the validity and reliability of the study test instrument. Because the study participants were fifth-grade students in the first half of the school year, the fourth-grade NAEP questions (generally administered in the second half of fourth grade) were determined to be at an appropriate level of difficulty. Also, because the study sample was children in Florida schools who were learning mathematics under the guidelines of the Sunshine State 153

PAGE 169

Standards, questions from the FCAT (Florida Comprehensive Achievement Test) that were designed to assess students mastery of the state standards were considered for inclusion. The selection process began with the identification of all questions that met the following four criteria: 1. The problem was set in a real-world context. 2. The problem was already formatted as a constructed response item or was structured such that simply removing the multiple choice answers made the problem suitable for constructed response format. 3. No manipulatives or tools were required to solve the problem. 4. The problem lent itself to student explanation of the solution process. The 53 problems that met all four criteria were then categorized as assessing one of the following five content standards: number and operations, algebra, geometry, measurement, and data analysis and probability (NCTM, 2000). Not only did a majority of the selected test questions focus on number and operations (29 out of 53), but also this standard is the dominant focus of elementary mathematics. Number and operations is also a vital part of much of students problem-solving efforts throughout elementary school, therefore students number and operations problem-solving performance in fifth grade is less a matter of specific instruction from a specific teacher than it is a reflection of the students entire mathematics education experience. The standard of geometry would not have been a good choice for this study because student performance in geometry is directly related to specific instruction and may have produced a 154

PAGE 170

teacher effect that would have confounded the results (Crowley, 1990). For these reasons, the list of potential problems was narrowed to include only number and operations questions. The next goal was to select a subset of the remaining 29 problems that would represent a diverse range of topics within the number and operations strand, represent a wide range of difficulty, provide a variety of contexts, yet be small enough that administration of the test could be completed in about one hour. Twelve problems were selected that were representative of the breadth of the number and operations standard, ranged in difficulty from a low of 8% correct to a high of 60% correct based on NAEP and FCAT data, included a range of readability from 2.02 to 3.99, and were fairly well divided among the possible types of gendered labeling. Allowing for an average of five minutes per problem for twelve problems also meant that the length of the test would likely be appropriate. In the pilot administration of the test, the average length of time taken was 33 minutes, with a median of 29 minutes and a range of 16 to 56 minutes. Based on this information, the test length was deemed appropriate. Many of the questions required slight modification. Multiple-choice items were converted to constructed response formats through removal of the answer choices and rewording of the problem text to eliminate reference to the answer choices. All problems were amended to include the statement, Show your work, then explain how you know your answer is correct, following the original problem text. No other changes were made to problem context, numbers, or original wording. 155

PAGE 171

156 Table 11 lists each problem by name, and includes the concept it assesses, the source from which the probl em was taken, the percent of students who originally answered the problem co rrectly, the readability level of the question (Homan et al., 1994) the gendered labeling of the problem (Walsh et al., 1999), and the original fo rmat of the problem before it was modified for this study. Problems involving a male c haracter were considered male-labeled, problems using a female character were female-labeled, item s including mention of both males and females were consi dered both maleand female-labeled, and problems with no reference to males or females were considered neutral. The problems are listed on the table in t he order of their placement on the pilot test. The order of the problems was determined by pur poseful placement of a few of the easier problems (according to data of percent correct) at the beginning. In order to increase the likelihood that students would become engaged in the test and persist through the twelve items, it was necessary to purposefully place some of the easier it ems (based on historic al data) at the beginning of the test (D. R. Thompson, personal communication, November 17, 2003). Although random order ing could have been appli ed, the researcher determined that the negative impact on student engagement of potentially having the most difficult items appear at the beginning of the test overshadowed the potential benefits of random ordering. The remaining problems were arranged in

PAGE 172

Table 11 Description of Test Questions Selected for the Study Question Name Content Source % Correct Readability a GL b OF c Flower pot Multiplication with interpretation of a graphic NAEP, 1996 50 2.77 M MC Ms. Hernandez Division and interpretation of the remainder NAEP, 1996 39 2.35 F CR Calories Estimation with rounding and addition NAEP, 1992 45 3.61 N CR Jeans class Proportional reasoning and addition NAEP, 2003 34 3.05 B MC Pizza Conceptual understanding of fractions NAEP, 1992 24 2.54 B CR String Division of fractions NAEP, 2003 27 2.70 M MC Balloons Estimation using interpretation of a graphic FCAT, 2001 60 3.61 N CR Flour Multiplication of fractions NAEP, 1992 21 3.01 N MC Calculator Place value NAEP, 1992 20 2.23 F CR School lunch Decimal multiplication and addition NAEP, 1996 17 3.99 M CR Pencils Fraction sense and interpretation of graphic NAEP, 2003 11 2.02 B CR Birdseed Decimal division and proportional reasoning NAEP, 1992 8 2.92 N MC a Readability is a grade level estimate based on Homan, Hewitt, & Linder (1994). b GL = gendered labeling where M=male, F=female, N=neutral, and B=both. OF c = original format with MC = multiple choice, CR = constructed response. NAEP National Assessment of Educational Progress. FCAT Florida Comprehensive Assessment Test. 157

PAGE 173

an effort to separate those that were particularly difficult (based both on percent correct and readability), those that contained a graphic, and those that tested similar content or required similar skills. A review of students performance on individual items for the pilot study confirmed the ordering of the items for the full study. Validity The validity of a test is generally concerned with the extent to which an instrument measures what it is supposed to measure. More specifically, validity is defined as the appropriateness, meaningfulness, and usefulness of the specific inferences made from test scores (Ary, Jacobs, & Razavieh, 1996, p. 262). Validity is always specific to the particular purpose for which the instrument is being used, and there are three different types of evidence that can be gathered to support the inferences being made from the scores of a measuring instrument: content-related, criterion related, and construct-related (American Psychological Association, 1985). Content-related evidence speaks to the extent to which the sample of items on a test is representative of some defined domain of content (Ary et al., 1996). The fact that the questions being used in the study test were drawn from the National Assessment of Educational Progress and the Florida Comprehensive Assessment Test, together with the range of content covered by the twelve selected items (Table 8) provides evidence of content validity. Criterion-related evidence shows the extent to which the scores on a measuring instrument are related to an independent external variable (criterion) believed to 158

PAGE 174

measure directly the behavior or characteristic in question (Ary et al., 1996, p. 265). A criterion measure must demonstrate relevance and freedom from bias. Evidence of relevance was achieved by a careful comparison of the test items to the content appropriate for fifth-grade students (NCTM, 2000). Freedom from bias is achieved if the scoring of a criterion measure is not influenced by any factors other than actual performance on the criterion. As the test items were scored according to an established rubric, and the test papers did not contain any information about the student that would potentially impact the scoring, the resulting scores are deemed to be free from bias. Construct-related evidence focuses on the performance, linguistic explanation, self-efficacy, and self-assessment scores as measures of a trait or construct (Ary, 1996). In this instance, the measurement of problem solving ability, quality of linguistic explanations, self-efficacy, and self-assessment should be as independent as possible from the measure of any other construct. The measures of self-efficacy and self-assessment follow the work done by other researchers, and are believed to possess construct-validity. However, as the research literature reviewed in Chapter 2 demonstrates, problem-solving ability cannot be measured independent of reading ability. Additionally, when linguistic explanations are required, as with this and other problem-solving tests, and scoring rubrics include the explanation as a criterion, the measurement of problem-solving ability is not independent of the measurement of writing ability. The testing and scoring instruments that were used in this study were intended to be representative of the types of instruments currently being used to assess 159

PAGE 175

students for the purposes of making high-stakes decisions about their academic futures. Given that this research was designed to imitate the current assessment situation using instruments that admittedly present threats to construct validity, the researcher included the two confounding variables as variables of interest in this study: reading ability as an independent variable, and quality of linguistic explanations as a dependent variable. This research attempted to determine the extent to which reading ability moderates students problem solving scores. Also, because the problem solving rubric did include students linguistic explanations as a criterion, the linguistic explanations were scored independent of the problem solving score to determine the extent to which students ability to explain their thinking may have impacted their problem solving scores. Reliability of the Data Cronbach Alpha The reliability of a measuring instrument is the degree of consistency with which it measures whatever it is measuring (Ary et al., 1996, p. 273). One way to measure reliability involves assessing a tests internal consistency, the extent to which all test items are measuring the same thing. The Cronbach alpha coefficient, named after Lee Cronbach (1951), is used to measure internal consistency when the test items are not scored simply as right or wrong, but are given a range of scores. Because the items on the study test were scored on a scale of zero to four for performance, and zero to three for linguistic explanations, the Cronbach alpha is an appropriate measure of reliability. The Cronbach alpha 160

PAGE 176

coefficient for the pilot administration of the test for mathematical performance (n=23) was .89, and for linguistic explanations was .75. These numbers are considered satisfactory following the guidelines of Guilford and Fruchter (1978). The Cronbach alpha coefficient was computed for mathematical performance, linguistic explanations, mean bias of self-efficacy, and mean bias of self-assessment for the full study. The raw coefficients for each of these variables were .79, .87, .83, and .79 respectively. These numbers are considered satisfactory following the guidelines of Guilford and Fruchter (1978), and indicate that for these four variables, the test instrument produced scores that had an acceptable level of internal consistency. Interrater Reliability Training and Scoring Another reliability issue is the consistency of the scoring of test items. To measure the extent to which the researcher accurately and reliably applied the rubrics for scoring mathematics performance and linguistic explanations, a stratified random sample of twenty-five test papers was doubled-scored. Prior to any work being done by the second scorer, two training sessions were conducted by the researcher. The second scorer was a mathematics education professional with extensive experience in elementary mathematics content and pedagogy. The training process began with a general discussion of the anchor papers selected from student work in the pilot study, the general rubrics for performance and linguistic explanations (Appendices A and C), and the item-specific rubrics for performance and linguistic explanations (Appendices K and 161

PAGE 177

L). Next, one complete student test from the pilot study was selected at random for review and discussion. The researcher and second scorer discussed the students work for each test item and how each response would be scored according to each of the rubrics. The researcher then selectively sampled two additional pilot test papers that reflected a wide variation in performance scores based on the pilot test results. These papers were deemed likely to provide a more complete practice experience in the scoring process than test papers for which the students scored similarly on all test items. In between the two training sessions, both the researcher and the scorer independently scored each of the twelve items for each of the two tests for quality of performance and linguistic explanations. The second training session was used to discuss all disagreements and arrive at a consensus. The researcher and scorer then proceeded with the independent scoring of 25 test papers selected from the full study. All interrater reliability scoring and discussion of scoring results was completed before the researcher scored any of the remaining test papers for the study. The 25 test papers that were double-coded were selected using a stratified random sampling process (Patton, 2002) to ensure equitable representation for each school and classroom in the study sample. One test paper was selected at random from each classroom included in the study, resulting in a total of 16 tests. The remaining nine tests were selected by randomly choosing three tests from each of the three schools. All twelve test items for all twenty-five tests were independently scored on both measures by 162

PAGE 178

the researcher and the second scorer. Any disagreements between the numerical score assigned to a response by the two scorers were tallied, and a percentage of agreement was calculated. Any disagreements that were a direct result of a lack of clarity in the text of either the general rubric or the item-specific rubrics were tallied separately. The discussion between the researcher and the scorer that resulted from these disagreements led to clarification of the language of the rubrics, thereby enhancing the reliability of the subsequent scoring of all test papers by the researcher. The reliability of the categorization of solution processes was enhanced by having the second scorer assess the solution process of every item for which the process was unclear to the researcher. Any differences of opinion were negotiated with the result of complete agreement for the categorization of all processes that were in question. For the mathematics performance score on the 25 double-scored tests, there was preliminary agreement on 243 of the 300 items (81%). Of the 57 items for which there was not initial agreement, 63% of the disagreements were the result of the researcher assessing a score one point higher than the second scorer. A further discussion of these 57 items resulted in the identification of 31 items for which the disagreement was a direct result of ambiguity in the language of the item-specific rubrics. When the language of the rubrics was clarified, all 31 of these disagreements were rectified, resulting in an overall 91% agreement rate. The remaining 26 disagreements were discussed until consensus was reached. 163

PAGE 179

For the linguistic explanation scores, there was preliminary agreement on 213 of the 300 items (71%). Of the 87 items for which there was not initial agreement, fully 76% were the result of the researcher assessing a score one point higher than the second scorer. A further discussion of the 87 items resulted in the identification of 30 items for which the disagreement was a direct result of ambiguity in the language of the general and item-specific rubrics. When the language of the rubrics was clarified, all 30 of these disagreements were rectified, resulting in an overall 81% agreement rate. The remaining 57 disagreements were discussed until a consensus was reached. The clarifications made in the language of the item-specific rubrics and the general rubric for scoring linguistic explanations that resulted from the interrater reliability process were implemented in the scoring of all remaining test papers. An example of one of these clarifications occurred in the item-specific rubric for the question about Jeans class. Throughout the scoring for mathematical performance, a top score of 4 was reserved for a model response that demonstrated a clear understanding of the concept, showed all necessary work, and provided a contextualized explanation. A score of 3 represented a successful response and differed from a score of 4 by, among other things, either omitting the explanation or providing an explanation that lacked context. In the original version of the rubric for this item, the language describing a score of 3 stated Correct solution with correct work but no explanation OR Minor flaw in solution process but explanation clear and correct. Neither of these possibilities 164

PAGE 180

included the situation of a correct solution with correct work and a correct explanation without context. An example of an explanation without context for the Ms. Hernandez problem would be, I divided thirty-four by eight and got four with a remainder of two. Although this explanation does detail the steps of the computation, it does not relate the computation to the context of the problem of teams and substitutes, and does not satisfy the requirement of the question that states, Explain how you know you are correct. Therefore, an answer with this explanation is not considered a model response and should receive a score of 3. Because neither the description for a score of 4 nor the description for a score of 3 included this scenario, the scorer was left to wonder how to score this response. To clarify the description, language was added to the descriptor for a score of 3 to include an explanation that lacked context. Member-Check Interviews The purpose of the member check interviews was to ensure the reliability of the researchers categorization of problem-solving processes, and the students choices of faces to denote their level of self-efficacy and self-assessment. Four boys and four girls were purposefully selected for the interviews based on a preliminary scoring of their test papers. The goal was to talk with the students who performed reasonably well on the test but provided very little linguistic explanation for their work. Although there may have been a greater potential for error in the categorization of problem-solving processes with students who scored very low on the test, there was also less likelihood that 165

PAGE 181

these low students would be able to identify and successfully verbalize their processes. Students who scored well and provided a high level of linguistic explanation for their processes were not selected because their written work combined with their linguistic explanations would likely yield a reliable categorization of their problem-solving processes. To reduce the time period between test administration and the interviews, the selection process began with a cursory perusal of all student work to assess the quantity of linguistic explanations provided and eliminate from consideration all students who provided linguistic explanations for most of their problem-solving processes. Every student whose test paper contained very little in the way of linguistic explanations was considered for the interview. These tests were scored using the performance rubric, and only those students who scored 24 or above on the performance rubric remained eligible for consideration. A score of 24 represents an average of 2 points on each question and indicates some level of problem solving proficiency. The remaining students tests were then scored for linguistic explanations and the final selection was based on those scores. To ensure an equitable representation of interviewees at the three schools, three students were chosen from each of the two larger schools (two boys and one girl from one school, and one boy and two girls from the second school), and two students (one boy and one girl) were chosen from the smaller school. A total of four boys and four girls with a minimum score of 24 for performance and the lowest scores for linguistic explanations were selected for interviewing. An alternate boy and girl were also selected at each school in case the first-choice 166

PAGE 182

student would not or could not be interviewed. Seven of the eight original choices were interviewed, but a substitute of one male alternate was required due to teacher request. To reduce the likelihood that students memory failures would impact the results of the interviews, the students own work served as a means to stimulate recall, and the students were allowed time to review their own work before providing a verbal explanation. The time lapse between test administration and interviews was approximately three weeks for the pilot study, and the students were successful in recalling their thought processes when given time to think. The time lapse between test administration and interviews for the full study was an average of two and a half weeks. During each individual interview, the student was shown her/his original test booklet and was asked to first read the question aloud, explain why he/she chose to circle the given face for self-efficacy, explain his/her work for the question, then explain why he/she chose to circle the given face for self-assessment. The interview protocol was repeated for each of the twelve test items, and the interviews were audiotaped and videotaped. The researcher intended to use the audiotape for transcription and the videotape to add the details that could only be perceived through watching the video. However, the poor quality of the audiotape resulted in the videotape being used for both purposes. The interviews were approximately 20 minutes in duration. After all interviews were completed and transcribed, and all tests were scored, the researcher reviewed the edited transcript alongside the students 167

PAGE 183

work to determine percentages of agreement for three items. The first was whether the students verbal explanation of his/her problem solving processes matched the descriptions the researcher assigned to the written work. The second and third measures involved the percentage of agreement between the students verbal explanations for their circling of faces for self-efficacy and self-assessment and the language used in the test training to describe the meaning of each face. Problem-Solving Processes The transcripts of the interviewed students verbal explanations for their problem-solving processes were compared to the problem-solving process by which the written explanation had been categorized. Eight students were interviewed and thirteen process descriptions were reviewed for each student. Although there were only twelve questions, the calculator question required students to provide two different methods for solving the problem, resulting in two separate codings for this question and a total of thirteen responses per student. Of the total 104 responses (8 students x 13 responses each), 99 responses were coded accurately representing 95% accuracy. The five errors in coding of problem-solving process were then analyzed to determine how many resulted in an error in the coding of problem-solving theme. Of the five incorrectly coded responses, three did not impact the coding of the theme. Only two of the errors resulted in an incorrect coding for the theme of the process, yielding a 99% accuracy rate for the coding of themes. This high level of accuracy gives the 168

PAGE 184

researcher confidence that the system for the coding of process descriptions and themes yielded a reliable representation of students actual written work. Self-Efficacy The next stage of the individual member-check interview review was to compare students verbal explanations for their selection of level of self-efficacy to the range of students descriptions of the levels of self-efficacy that were provided during the whole-class training sessions. These training sessions occurred immediately prior to each testing event and were documented through field notes taken by the researcher while the students were testing. The researcher compiled the field notes that were recorded after each session, and organized the students comments as to why they would circle each face for self-efficacy and self-assessment. The review indicated that the students verbal explanations for their selection of the appropriate face to circle matched the descriptions of each face provided by students during training in 89 out of 96 cases (93% agreement.) An example of an agreement would be a student who said he/she circled the very happy face after reading the question, Because I thought it was easy. This response was in line with the responses provided by students during the pre-test training such as, It seems really easy, or I know how to do the problem. Of the seven instances of disagreement, six were situations in which the student circled a face that indicated a level of self-efficacy that was one level higher than their verbal description of their reasoning for circling that face. An example of this situation is a student who explained that he/she circled the 169

PAGE 185

slightly happy face Cause I wasnt sure if, if um I could really do it. This verbal description would align more closely with the middle face that is neither happy nor sad and indicates an uncertainty in ability to complete the problem successfully. The high level of agreement, and the consistency in the direction and severity of the disagreements, indicates that the five-face scale was a reliable measure of students self-efficacy during the testing event. Self-Assessment The last stage of the member check interview review was to compare students verbal explanations for their selection of the face that best depicted how well they think they did on each question (self-assessment) to the range of students descriptions of the levels of self-assessment that were volunteered during the whole-class training sessions. The review indicated that the students verbal explanations for their selection of the appropriate face to circle matched the descriptions of the faces provided by students during training in 88 out of 96 cases (92% agreement.) An example of an agreement would be a student who said he/she circled the very happy face after completing work on the problem, Cause I was pretty confident that I got it correct. This response was in line with responses provided by students during the pre-test training such as, I figured it out and I know I got it right, or Im confident I got it right. Of the eight instances of disagreement, six were situations in which the student circled a face that indicated a self-assessment rating that was one level higher than his/her verbal description of his/her reasoning for circling that face. An example of this situation is a student who explained that he/she circled the 170

PAGE 186

slightly happy face Cause I wasnt so sure if it was the right answer. This verbal description would align more closely with the middle face that is neither happy nor sad and indicates an uncertainty as to whether the problem was successfully completed. Some examples of student explanations of the middle face for self-assessment from the pre-test training are: Im not sure if I got it right, or I might have gotten it right but Im not sure. The high level of agreement, and the consistency in the direction and severity of the disagreements, indicates that the five-face scale was a reliable measure of students self-assessment during the testing event. Data Collection In order to answer the research questions, the study test was administered to the fifth-grade students participating in the study. The answer for each test question was evaluated for the quality of the work based on a four-point rubric (see Appendix A for scoring guide), and the quality of the linguistic explanation based on a three-point rubric (see Appendix B for scoring guide). Students indications of self-efficacy and self-assessment on the five face scale were converted to a numerical value ranging from zero to four. Participation in this study was voluntary, required parental permission, and was conducted in accordance with the University of South Florida Office of Research, Division of Research Compliance. All fifth-grade students in the three study schools, with the exception of the students who participated in the pilot study as fourth graders, were invited to participate. Informed consent forms 171

PAGE 187

(Appendix H) were completed prior to test administration, and included permission for both the test administration and the follow-up interview. The informed consent documents were sent home with a cover letter (Appendix I) that briefly explained the purpose and procedures of the study, provided a place for a parent to sign if they were not willing to have their child participate, and explained how to complete the attached informed consent document if parental permission was to be granted. Students were given a small, non-edible incentive for returning either the cover letter indicating refusal of parental permission, or a signed informed consent document indicating that parental permission had been granted. Students who did not initially return either document were given another copy of the original cover letter and informed consent forms to take home. Organization of Testing Testing was performed at three elementary schools in Florida, and was conducted with one group of students at a time. Students who were present on the initial testing day were either tested in their regular classroom, in a neighboring teachers room, or in the media center, depending on individual teacher preference. Students who were absent on the initial testing day were tested individually or in groups in either the media center or a private work room. Before testing began, each student was given a test booklet and asked to complete the cover page with his/her name, teacher, school, and gender. When all students had completed that information, they were directed to turn to the first page of the test booklet for review of a practice problem. The purpose of the practice problem page was to familiarize the students with the test format and the 172

PAGE 188

self-efficacy and self-assessment scales. The instructions for both the self-efficacy and self-assessment scales were read aloud and explained by the researcher. Students were then led in an open discussion of the conditions under which they might choose each of the five faces for each scale. The researcher kept a journal record of student comments that occurred during this discussion. Students were told that each question asked them to show their work and explain their thinking with the phrase Explain how you know your answer is correct. Students were reminded of the importance of providing complete explanations of their problem-solving processes, and were then encouraged to do their best work and completely answer each question. They were told that during the test administration, the researcher would circulate around the room and point at the students papers whenever they had neglected to circle a face for one of the self-efficacy or self-assessment scales. They were then instructed to raise their hand when they were finished with the test. When all students indicated an understanding of the test format, they were instructed to begin the test. During the test administration, each teacher was provided with an amended copy of the test. The cover page for the test explained that due to the fact that the study was being conducted early in the school year and the teacher may not have covered all material included on the test, the researcher was requesting that each teacher rate each question for the extent to which his/her students were prepared to answer the question. In the body of the test, a five-face scale was placed in the work space for each question with the text Circle 173

PAGE 189

the face that best shows how well prepared your students are to answer this question successfully. All teachers were requested to answer the preparedness question for each test item. For the duration of each test administration, the researcher circulated about the room to remind students to circle the self-efficacy and self-assessment faces as they worked. All work was completed individually by each student, and the researcher only answered student questions that related to test format or the self-efficacy and self-assessment scales. Only one student whose Individual Education Plan permitted the reading aloud of test questions requested that accommodation. He was seated away from the other students during the test administration, so that any items read aloud would not be heard by other students. Because the intent of this study was to determine the relationship between reading ability and the dependent variables, and a read-aloud accommodation would have created a confounding variable, this student was removed from the sample. The researcher noted the elapsed time for each students test on the test booklet when it was completed and handed in to the researcher. Based on data from the pilot study, the researcher estimated that test administration would take approximately one hour; however, no time limit was imposed on the students except in one situation. At one of the subject schools, the students changed classes for most subjects. During one testing session, the change of classes took place while two students were still working on the test. As these were the 174

PAGE 190

only two students in the study who did not finish, their test results were not included in the data analysis. Analysis of Data Upon data collection from all participating students, the data were analyzed to answer each of the five research questions. A summary of the data sources and data analysis for each research question is presented in Table 12. Research Question One Analysis of students problem solving processes was performed on a per item basis. For each test item, all student responses were sorted according to the process by which students solved each item (McCoy, 1994; Zhang, Wilson, and Manon, 1999). By way of illustration, some examples of process descriptions for a correct response to the flour problem were: 1 1/3 x 3 with correct answer Converted fractions to decimals (correctly) then did computation Solved strictly with picture/diagram correct answer 1 1/3 + 1 1/3 + 1 1/3 correct answer A detailed list of all the processes identified for each question is provided in Appendix M. Before the analysis of problem-solving processes commenced, the researcher intended to conduct a chi-square statistical test for each item to determine whether gender differences in the students problem-solving 175

PAGE 191

Table 12 Summary of Data Sources and Methods Research Question Data Source Data Analysis 1. Processes Student work on Qualitative analysis of 12-item test student work on each item followed by chi-square test 2. Performance Student work on Factorial ANOVA using 12-item test students total test score as measured by the FCAT 4-point rubric (0-48 points possible), and multiple regression analysis 3. Linguistic Students' Factorial ANOVA using Explanations explanations of their students total score for problem-solving linguistic explanation as processes measured by a 3-point rubric (0-36 points possible), and multiple regression analysis 4. Self-Efficacy Students mean bias Factorial ANOVA using score = average of students mean bias students self-reported scores for self-efficacy, score of self-efficacy and multiple regression minus their performance analysis score on each item 5. Self-Assessment Students mean bias Factorial ANOVA using score = average of students mean bias students self-reported scores for self-assessment score for self-assessment together with multiple minus their performance regression analysis score on each item processes were statistically significant. For the pilot study, the largest number of processes identified for a single item was twelve. In addition, with the small 176

PAGE 192

amount of data generated from the testing of twenty-three students, the researcher was unable to identify clear themes that cut across all questions. For the full study, the smallest number of processes identified for any one question was twelve, and the sum of all processes for all twelve questions was 232. Conducting chi-square tests for 232 different processes would provide a large volume of data from which no meaningful conclusions would likely be generated. Also, as the data analysis for the full study proceeded, emergent analysis (Patton, 2002) across all items facilitated the identification of three broad categories of solution processes based on success in solving the problem. The broad categories that emerged were Successful Response (S), Partially Successful Response (P), or Unsuccessful Response (U). Under each of the three broad categories, the researcher identified five to six solution process themes for a total of seventeen themes. Six of the seventeen themes were similar to other themes that fell under different broad categories. For example, Computational problems fell under the categories Partially Successful Response and Unsuccessful Response. Each of the two iterations of Computational problems was counted individually as a theme, even though the two iterations represented levels of severity of the same error. A more detailed discussion of the broad categories and solution process themes that emerged as a result of the analysis is presented in Chapter 4. These broad categories and themes were then used as an organizing mechanism for the coding of the 232 individual solution processes used by students to solve the twelve questions. A 177

PAGE 193

list of all solution processes for each of the twelve test items along with the category and theme used to code each process is included in Appendix M. Viewing the solution processes through the lens of overarching themes allowed for analysis of the solution processes of boys and girls across all questions instead of strictly by individual test items. This change in methodology resulted in a deeper and richer analysis of the data and allowed for more meaningful conclusions to be drawn about the problem solving processes used by boys and girls. The researcher assessed for convergence and divergence throughout the process of identifying categories and themes and assigning solution processes to themes. To deal with the challenge of convergence, deciding what things fit together, the researcher looked for recurring regularities in the data that revealed patterns that could be sorted into categories. The placement of solution processes into categories and themes was judged according to two criteria: internal homogeneity and external heterogeneity. The first criterion is concerned with the extent to which the responses that belong in a certain category hold together in a meaningful way. The second criterion is concerned with the extent to which the differences between categories are clear. The researcher then worked back and forth between the data and the classification system to verify both the meaningfulness and accuracy of the categories and themes and the placement of solution processes in the categories (Patton, 2002). Because the researcher categorized strictly based on student work, such as whether the student used repeated addition or multiplication to solve the 178

PAGE 194

problem, double coding for interrater reliability was not necessary. However, there were many responses for which the students work was not clear. These responses were flagged, and the decision for categorization was made jointly by the researcher and the same second scorer who assisted in the other interrater reliability checks. Upon completion of the categorization process, simple descriptive statistics (frequencies and percentages) were computed to summarize the data for analysis of gender differences. Chi-square statistical tests were then conducted for each question to determine whether gender differences in the students problem solving processes were statistically significant. Additional chi-square tests were conducted by collapsing similar themes. For example, three individual process themes focused on students use of pictures or diagrams to solve the problem; one for a successful response, one for a partially successful response, and one for an unsuccessful response. Chi-square tests were conducted first on each of these three themes individually, then on the data from all three themes combined. Research Questions Two and Three The data analysis processes for the questions of performance and linguistic explanations were conducted in similar fashion. Each test item was scored according to a rubric, but to ensure that the scoring of the two criteria was not confused, all papers for one test item were scored first for mathematical performance and then for linguistic explanation. The performance score was obtained through application of the FCAT mathematics four-point rubric 179

PAGE 195

(Appendix A) and item-specific rubrics (Appendix K), together with the use of anchor papers, a sampling of which is provided in Appendix D. The item-specific rubrics were created by the researcher using student responses from the pilot study, and reflected the format of item-specific rubrics used by the State of Florida in conducting its training for FCAT 5 th grade mathematics scorers (State of Florida Department of State, 2001). The anchor papers were selected from the data collected in the pilot study and exemplified the qualities and characteristics of each possible score (0-4) for each item. Anchor papers were identified for most but not all scores for every item due to the fact that the small sample in the pilot study did not yield the full range of scores for every item. To provide additional data for later analysis, questions for which the student made no attempt were flagged using the numeral . All items scored as were treated as zero scores for the purposes of data analysis to answer the research questions. The linguistic explanations were scored using the researcher-created three-point general rubric (Appendix B) together with item-specific rubrics (Appendix L). The three-point general rubric was based on criteria established by Ciochine and Polivka (1997). The item-specific rubrics were created using student work from the pilot study. For the purposes of scoring, symbols and numerals embedded in the textual explanation were regarded as text. Students total scores for performance and writing were used as data for the analyses, but descriptive statistics for performance and writing on a per-item basis have also been reported. The scores for performance could range from 0-48, based on 12 180

PAGE 196

questions scored on a 4-point rubric. The scores for linguistic explanations could range from 0-36, based on 12 questions scored on a 3-point rubric. Samples of student work from the pilot study for a range of scores is provided in Appendix D. Criterion scoring reliability was determined from a sample of 25 randomly selected student tests. All twelve items on these 25 student tests were independently scored for performance and linguistic explanations by another mathematics education professional. Any disagreements between the numerical score assigned to a response by the researcher and the numerical score assigned to that same response by this second scorer were tallied. All disagreements were then resolved. Of the total 300 scores (12 items for 25 tests) for each of the two criteria, a percentage of agreement for interrater reliability was computed. Internal consistency reliability was also computed for performance and writing scores using the Cronbach alpha formula (Ary et al., 1996). For both questions, the data were first examined to determine measures of central tendency and dispersion. The assumption of independence was met by having all students complete their tests individually. The distribution of scores was analyzed to ensure that the assumption of normality for the factorial ANOVA model was not violated. Levenes (Stevens, 1986) test of homogeneity of variances was used to ensure that the assumption of equal variances was not violated. The three-way factorial ANOVA statistical design is a robust procedure that examines the effects of three independent categorical variables on one continuous dependent variable (Stevens, 1999). A factorial ANOVA (2 x 3 x 3) 181

PAGE 197

was applied to answer each question, using the students total score for performance as the continuous dependent variable to answer question two, and students total score for linguistic explanations as the continuous dependent variable to answer question three. For the two factorial ANOVAs being used to answer these two questions, the independent categorical variables were student gender (2 levels), reading ability (3 levels), and mathematics ability (3 levels). Given this design, there were potentially eight sources of variation, including three main effects, three two-way interactions, one three-way interaction, and within cells error. Post-hoc analysis included the Tukey procedure for the pairwise group comparisons of all significant main effects with non-significant interaction effects (Stevens, 1999) This procedure is appropriate when the cell sizes are unequal, and the researcher is interested in more than half of all pairwise comparisons. Planned comparisons were conducted for any significant interaction effects. As was the case in the pilot study, the cell sizes for this study were disproportional, with only a small percentage of students falling into the low category for reading and mathematics, and several of the cells having no members. If the disproportionate cell sizes were not indicative of the distribution of the population, the effects could become correlated, and unless these correlations are taken into account, the results could be misinterpreted (Stevens, 1999). However, based on an understanding of the population, the disproportionate cell sizes were expected. Because the review of literature supported the ordering of the independent variables, the hierarchical method was 182

PAGE 198

used. Chapter four includes a detailed explanation of the rationale for the method selection and the ordering of the variables. The data for these two questions was also independently analyzed using multiple regression. The independent variables for both questions were student gender, student mathematics FCAT scale scores (ranging from 100-500), student reading FCAT scale scores (ranging from 100-500), the gender and reading ability interaction, and the gender and mathematics ability interaction. This analysis allowed the researcher to draw conclusions as to the relationship between the independent variables and the dependent variables of performance and linguistic explanations, and the extent to which an increase in students FCAT scores yielded an increase in the dependent variables. Research Questions Four and Five The statistical procedures for research questions three and four, dealing with self-efficacy and self-assessment, were nearly identical. To begin, students self-report of self-efficacy and self-assessment was converted from the smiley-face likert scale to a numerical score from zero to four. A score of zero was assigned to the saddest face, a score of four was assigned to the happiest face, and the other faces were scored with the values one through three. The variable used in the three-way factorial ANOVA for each question was a measure of calibration called the mean bias score as described by Schraw (1995), and Yates (1990). Bias reveals the direction of the errors in judgment and is computed by subtracting actual performance from predicted performance. To obtain the bias scores for self-efficacy, the students average performance score was subtracted 183

PAGE 199

from his/her average self-efficacy score, resulting in a mean bias score between -4 and +4. Expressing no self-efficacy (score of 0) and providing a completely incorrect answer (score of 0) reflected a zero bias. The same lack of confidence with a completely correct answer (score of 4) resulted in a bias score of (zero minus four) indicating extreme under-confidence. Expressing complete confidence (score of 4) with a completely incorrect response resulted in a bias score of 4 (four minus zero) indicating extreme overconfidence. Hence, bias scores larger than zero correspond to overconfidence and scores less than zero correspond to under-confidence. The same procedure was used to compute a mean bias score for each student regarding self-assessment. Descriptive statistics were computed for students ratings of self-efficacy, self-assessment, and mean bias. Descriptive statistics for self-efficacy and self-assessment on a per-item basis are also reported. The mean bias score served as the single continuous dependent variable for each of the three-way ANOVAs used to analyze the data for questions four and five. For both analyses, the three categorical independent variables were gender (2 levels), reading ability (3 levels), and mathematics ability (3 levels). The experimental method of computing sums of squares was used for both of the questions (Stevens, 1999). Chapter four includes a full explanation of the method selection process. Post-hoc Tukey tests were conducted for all significant main effects with more than two levels. Criterion scoring reliability and internal consistency were established through the inter-rater reliability work and Cronbach alpha computation that was 184

PAGE 200

conducted in the analysis of questions two and three. Because the only other data included in the analysis of these questions involved scores that were self-reported by the participants using a scale that has been reliably used in other studies (Boekarts et al., 1995; Vermeer et al., 2000), no further reliability work was required. The data for these two questions was also independently analyzed using multiple regression. The independent variables for both questions were student gender, student mathematics FCAT scale scores (ranging from 100-500), and student reading FCAT scale scores (ranging from 100-500), and the interaction of gender with both the mathematics and reading scores. This analysis allowed the researcher to draw conclusions as to the relationship between the independent variables and the dependent variables of mean bias for self-efficacy or mean bias for self-assessment, and the extent to which an increase in students FCAT scores yielded an increase in the dependent variables. Summary A non-experimental causal comparative design was used to study the relationship between the independent variables of gender, reading ability, and mathematics ability and the dependent variables of students problem solving processes, self-efficacy, self-assessment, performance, and linguistic explanations. The participants were all fifth-grade students from three Florida elementary schools who obtained parental permission. Participant measurements were obtained through administration of a twelve-item problem185

PAGE 201

solving test. Member checks were performed with a selected sample of students based on performance and linguistic explanation scores. Statistical procedures for data analysis included descriptive statistics for all five questions. Qualitative analysis and a chi-square statistical test were used to analyze the data to answer the first research question involving problem solving processes. Three-way factorial ANOVAs were used for each of the remaining four questions with the independent categorical variables of gender (2 levels), reading ability (3 levels), and mathematics ability (3 levels). Post hoc Tukey tests were performed to identify what group means were significantly different when the F test indicated a significant main effect. Total performance scores based on a four-point rubric were the continuous dependent variable for question four. Total scores for linguistic explanations based on a three-point rubric were the continuous dependent variable for question five. Questions four and five, involving self-efficacy and self-assessment, required the computation of mean bias scores for use as the continuous dependent variable. Inter-rater reliability was established through the independent scoring of a random sample of test papers by a mathematics education professional. Cronbach alpha coefficients were obtained to estimate the internal consistency of test items for performance, linguistic explanation, self-efficacy, and self-assessment. 186

PAGE 202

CHAPTER 4 FINDINGS The purpose of this study was to examine how gender, reading ability, and mathematics ability differentially moderated students mathematical problem-solving processes, linguistic explanations of those solution processes, achievement on a mathematical problem-solving test, self-efficacy on a self-reported rating scale, and self-assessment on a self-reported rating scale. This chapter consists of the results of the statistical tests for each of the five study questions. The descriptive and inferential statistical results, together with the interpretation of the results, are organized according to the question they address. Question One: Findings for Problem-Solving Processes The findings in this section address the following research question: To what extent are students choices of mathematical problem-solving processes related to gender? The data for problem-solving processes, the variable of interest for this question, were obtained via a qualitative analysis of students work on the study test instrument. 187

PAGE 203

Coding and Emergent Themes To answer this question, each students work on each question was first coded to indicate the specific problem-solving process used to solve the problem. Appendix M provides a list of all identified processes for all test questions, and shows the frequencies and percentages of students use of the processes. The actual number of processes identified for each test item ranged from a low of twelve for the pencil problem to a high of twenty-eight for the string problem. After all processes were coded, the researcher identified three broad categories that would serve as a preliminary organizing framework for all responses: Successful response (S), Partially Successful response (P), and Unsuccessful response (U). Through a process of emergent analysis (Patton, 2002), five to six solution process themes were identified for each of the three broad categories, yielding a total of seventeen themes. All student responses for all test items for all students were coded again, this time according to the theme that encompassed the problem-solving process for each question. Because the calculator question required students to solve the problem in two different ways, the student responses for each of the two ways received a separate coding. For this reason, each students test paper was coded for thirteen themes. The list of process descriptions in Appendix M also includes the category and theme to which each process was assigned. The following examples represent a subset of the problem-solving processes identified for the school lunch problem and serve to illustrate the relationship between the processes, categories and themes. The school lunch problem 188

PAGE 204

states: Sam can purchase his lunch at school. Each day he wants to have juice that costs 50, a sandwich that costs 90, and fruit that costs 35. His mother has only $1.00 bills. What is the least number of $1.00 bills that his mother should give him so he will have enough money to buy lunch for 5 days? Process: $9 answer found by adding 3 items x 5 then rounding (minor computational error okay). Category: Successful response. Theme: Solved using a traditional method not a drawing/diagram. Process: $8.75 answer found by adding 3 items x 5 (minor computational error okay). Category: Partially Successful response (student did not answer question). Theme: Either difficulty understanding question or inattention to question. Process: Rounded total for one day x 5 but with major computational error. Category: Partially Successful. Theme: Computational problems. Table 13 presents all the categories and themes of problem-solving processes that emerged in this study. To clarify some of the terms used in the coding process, a traditional method is defined for this study as a method that a textbook curriculum or a classroom teacher might present as a means of solving the problem. The researchers knowledge of what constitutes a traditional method derives from personal experience with teaching mathematics at the fifth-grade level. Some examples of traditional methods follow: 189

PAGE 205

Table 13 Categories and Themes of Problem-Solving Processes Category Theme Description Successful 1 Solved using traditional method not a drawing/diagram Successful 2 Solved using novel method not a drawing/diagram Successful 3 Solved via drawing/diagram Successful 4 Solved using simplified method not a drawing/diagram (e.g., repeated addition vs. multiplication, trial and error, etc.) Successful 5 Solved correctly without written work Partially Successful 6 Computational problems Partially Successful 7 Either difficulty understanding question or inattention to question Partially Successful 8 Problems with mathematical logic/concept (e.g. incorrect order of operations) Partially Successful 9 Problems with drawing/diagram (either creation of own or use of given) Partially Successful 10 Novel approach to problem that shows good thinking, but does not produce a solution within the confines of the problem Partially Successful 11 Reason for difficulty cannot be determined, but answer shows some entry into problem Unsuccessful 12 Computational problems Unsuccessful 13 Either difficulty understanding problem or inattention to question Unsuccessful 14 Problems with mathematical logic/concept Unsuccessful 15 Problems with drawing/diagram Unsuccessful 16 Nonsense answer Unsuccessful 17 Source of error cannot be determined 190

PAGE 206

Flowerpot item: Multiplied 20 pots x 3 seeds per pot = 60 seeds needed. String item: Divided by and answered 6 pieces. Pizza item: Answered that Joses pizza was bigger, therefore the half he ate was bigger. Calculator item: Added 100 to the incorrect number of 8275 to get the desired number of 8375. A novel method is one that incorporates creative thinking with sound mathematical logic. Some examples of responses judged to be novel follow: Flowerpot item: Grouped pots by 10 x 3 seeds = 30 seeds. 30 seeds x 2 groups = 60 seeds. String item: Converted all measures to inches and solved correctly. Pizza item: Jose is right because his half had more toppings and was therefore more pizza. Calculator item: Multiply the display by zero to get zero, then enter the desired number. A simplified method is one which employs, for example, repeated addition of five items instead of multiplying by five, or trial and error addition as a substitute for division. The term simplified is not used to indicate that the actual computation required by the chosen method is easier, but that the operation chosen is taught at an earlier age and is considered to be a more rudimentary method of completing the computation. For example, children are taught addition before multiplication. For a child who is competent in both operations, multiplying 191

PAGE 207

a single number by five would be faster than adding the number five times. However, because repeated addition requires a lower level of computational knowledge, it is considered a simplified method for this study. Items coded with a theme of solving via a drawing/diagram are those that used the drawing/diagram as the actual solution method, not simply as a way to illustrate computational work. Some themes were duplicated under more than one category because they represented varying levels of success with a particular method. An example of this is themes 6 and 12 computational problem. To begin, a response that used productive mathematical logic, included computation that indicated a complete understanding of the computation process, but contained a careless error, was categorized as a successful response. In order for a response to be coded with the partially successful theme six computational problems, the computational work had to show some level of sound mathematical logic, but indicate a misunderstanding of the computational process. An example of this would be a student who knew that to solve the flour problem required multiplying 3 x 1 1/3, but incorrectly multiplied the fraction and reported the answer as 3 and 3/9. If that same student completed the same computation and reported the answer as 1 3/9, the response would have been coded as the unsuccessful theme 12 computational problems. In this instance the student erred in the computation for both the whole number and the fraction, and provided an answer that was unreasonable given the context of the problem. 192

PAGE 208

The theme Either difficulty understanding question or inattention to question appeared in the categories of Partially Successful and Unsuccessful. An example of a partially successful response that was coded with this theme comes from the Ms. Hernandez problem. The question in this item required students to divide the total number of students by the size of each team, then interpret the remainder as the number of substitutes. A student response that completed the division but reported the number of teams instead of the number of substitutes was coded with the partially successful version of this theme. This student showed an understanding of part of the problem, but was inattentive to the actual question. An example of an unsuccessful response for this theme comes from the school lunch problem. The item required students to find the total cost of purchasing three school lunch items for five days, and then requires that the total be rounded to the next dollar because the mother only has one dollar bills to give the child. A few student responses indicated that they interpreted the question to be asking Given that the three lunch items cost this much each, what could the child buy if he only had one dollar to spend? This response shows a complete misunderstanding of the problem and was coded as unsuccessful for this theme. Descriptive Statistics The first statistical measures used to understand the data for this question were descriptive in nature. Table 14 organizes the data according to theme, and indicates the number and percentage of boys and girls whose problem-solving processes were coded according to each theme. The percentages in the total 193

PAGE 209

Table 14 Descriptive Statistics for Coding of Problem-Solving Process Themes Girls Boys Total Theme Description n % n % n % Successful Reponses 1 Traditional 605 43.1 793 47.3 1398 45.4 2 Novel 67 4.8 100 6.0 167 5.4 3 Drawing/diagram 31 2.2 25 1.5 56 1.8 4 Simplified 46 3.3 51 3.0 97 3.2 5 No work 9 0.6 13 0.8 22 0.7 Partially Successful Responses 6 Computational problems 89 6.3 83 5.0 172 5.6 7 Question difficulty 142 10.1 174 10.4 316 10.3 8 Math logic problems 31 2.2 38 2.3 69 2.2 9 Drawing/diagram problems 25 1.8 22 1.3 47 1.5 10 Novel with problems 10 0.7 11 0.7 21 0.7 11 Cannot be determined 50 3.6 73 4.4 123 4.0 Unsuccessful Responses 12 Computational problems 15 1.1 19 1.1 34 1.1 13 Question difficulty 39 2.8 26 1.6 65 2.1 14 Math logic problems 27 1.9 37 2.2 64 2.1 15 Drawing/diagram problems 27 1.9 21 1.3 48 1.6 16 Nonsense answer 40 2.9 45 2.7 85 2.8 17 Cannot be determined 151 10.8 146 8.7 297 9.6 Note. N=3,081 responses. Percentages for boys and girls are based on 108 girls and 129 boys. Minor errors in percentage totals are due to rounding. 194

PAGE 210

column are based on the sample of 3,081 responses from 237 students (129 boys (54%) and 108 girls (46%)) from their responses to thirteen items (the calculator question was coded with two themes). Inferential Statistics The chi-square statistical test was used for this question to determine whether the proportions of girls and boys who were observed to have used a particular problem-solving process theme to answer the questions differed significantly from the theoretically expected proportion. The test was run for each question individually, and the chi-square results, p-values, and Cohens w (1992) effect size estimates for each of the tests are reported in Table 15. The guidelines for interpreting Cohens w are that small, medium, and large effect sizes are 0.10, 0.30, and 0.50, respectively. In order to keep the overall type I error rate to =.05, the Bonferroni adjustment was applied resulting in an =.004 for each question. The null hypothesis for this question was that there was no statistically significant difference in the percent of boys and girls for the themes of the processes they used to solve the problems. The chi-square results indicate that none of the differences were statistically significant, resulting in a failure to reject the null hypothesis. There is no way to be sure whether this failure to reject was indicative of a lack of gender differences in the population or the result of this study being underpowered due to the size of the sample. If, in fact, there is a gender difference in the population, but the effect size was small, the sample for this study would have been too small to detect the difference. 195

PAGE 211

Table 15 Chi-Square Results for Problem-Solving Process Themes Question X 2 p w Flowerpot 10.6523 0.2222 0.2120 Ms. Hernandez 5.6407 0.6874 0.1543 Calories 9.9248 0.0774 0.2046 Jeans Class 12.4521 0.0866 0.2292 Pizza 10.5211 0.2303 0.2107 String 11.5281 0.4843 0.2205 Balloons 7.3760 0.4967 0.1764 Flour 12.5717 0.1275 0.2303 Calculator1 3.0895 0.7975 0.1142 Calculator2 7.4091 0.3876 0.1768 Lunch 2.0554 0.8414 0.0931 Pencils 16.9784 0.0303 0.2677 Birdseed 14.9828 0.1833 0.2415 Note. N=237, X 2 =Chi-squared, w=Cohens w measure of effect size. None of the effect sizes were classified as large, but the effect size for the pencil question was the highest of all questions, and was nearly medium. It is, therefore, the only one that bears further discussion. Table 16 presents the frequencies and percentages of boys and girls whose problem-solving processes for this question were coded with each theme. Missing theme numbers indicate that no student responses were coded for that theme for this problem. 196

PAGE 212

Table 16 Problem-Solving Process Themes for the Pencil Problem Girls Boys Total Theme Description n % n % n % Successful responses 1 Traditional 35 14.8 50 21.1 85 35.9 2 Novel 3 1.3 14 5.9 17 7.2 3 Drawing/diagram 8 3.4 5 2.1 12 5.5 Partially Successful Responses 6 Computational problems 5 2.1 5 2.1 10 4.2 7 Question difficulty 13 5.5 25 10.6 38 16.0 Unsuccessful Responses 12 Computational problems 6 2.5 4 1.7 10 4.2 13 Question difficulty 12 5.1 7 3.0 19 8.0 15 Drawing/diagram problems 2 0.8 4 1.7 6 2.5 17 Cannot be determined 24 10.1 15 6.3 39 16.5 Note. Minor errors in percentage totals are due to rounding. N=237 (108 girls and 129 boys). Because boys constituted 54% of the sample for this question, it was expected that the number of boy responses for each theme would be greater than the number of girl responses. However, girl responses outnumbered boy 197

PAGE 213

responses in three of the four themes that were categorized as unsuccessful: themes 12, 13, and 17. Over four times as many boys as girls solved the problem successfully via a novel method, and more girls than boys successfully solved the problem using a drawing or diagram. Nearly twice as many boys as girls were coded with theme 7 indicating that they were only partially successful because they either had difficulty understanding the question or were inattentive to the question. The lack of significant findings led to the decision to re-run the chi-square tests after the themes were collapsed. The data for themes that appeared under more than one major category were combined, such as the successful, partially successful, and unsuccessful themes for solving via a drawing or diagram. Again, no significant results were found. Question Two: Findings for Mathematical Performance The findings in this section address the following research question: To what extent is childrens performance in mathematical problem solving related to gender, reading ability as measured by teacher ratings and Florida Comprehensive Assessment Test (FCAT) reading scores, and mathematics ability as measured by teacher ratings and FCAT mathematics scores? The data for mathematical performance, the dependent variable for this question, were obtained through the administration of the study test instrument and consisted of twelve constructed-responses mathematical problem-solving items. Each item 198

PAGE 214

was scored according to a four-point holistic rubric (Appendix A), resulting in a possible total score ranging from 0 to 48 points. Descriptive Statistics Descriptive Analysis by Question Analysis of overall sample. Table 17 shows the mean, standard deviation, skewness, and kurtosis for students total mathematics rubric scores with a total possible score of 48, as well as for each of the twelve questions that have a total possible score of 4 points. The data for students total math performance scores yielded a mean of 26.76, a standard deviation of 7.24, and a relatively normal distribution with skewness and kurtosis values at 0.04 and -0.61. As expected, the distributions for individual questions for which students scored relatively high were negatively skewed. Several of the distributions for individual questions were noticeably leptokurtic, a characteristic that is indicative of a spiked curve with many more extreme scores than would be found in a normal distribution. As with skewness, this level of kurtosis is not unexpected for the scores of a single item. Because the total mathematics performance scores were nearly normal, the distributions do not violate the assumption of normality essential to the factorial ANOVA test. The highest and lowest scoring questions were the flowerpot and string problems, respectively. The flowerpot item required students to determine the number of pots pictured and multiply that number by three to determine how many seeds would 199

PAGE 215

be needed to plant three seeds in each pot. The computation required for this question was below the fifth-grade level, and should have resulted in high scores. Table 17 Descriptive Statistics for Mathematics Performance Scores Question M SD Skewness Kurtosis Flowerpots 3.08 0.83 -1.38 2.88 Ms. Hernandez 2.68 0.89 -1.31 1.25 Calories 2.83 0.88 -0.85 0.37 Jeans Class 2.68 0.98 -0.51 0.20 Pizza 2.31 1.30 -0.15 -1.12 String 0.90 1.06 1.07 0.06 Balloons 2.91 0.88 -1.46 2.75 Flour 2.04 1.02 -0.48 -0.22 Calculator 1.79 1.41 -0.02 -1.36 School Lunch 2.22 1.03 -0.79 -0.31 Pencils 1.87 1.48 -0.06 -1.43 Birdseed 1.46 1.26 0.46 -1.04 Total 26.76 7.24 0.04 -0.61 Note. N=237, max. score for individual questions = 4, max. score for total =48. The string problem required students to divide fractions to determine how many pieces of string, each of a yard long, could be cut from a piece of string of a 200

PAGE 216

yard long. Division of fractions is not usually explicitly taught until the end of the first half of fifth grade, therefore it is not surprising that few students were able to reason out a successful response. The results for the easiest and hardest items mirror the findings of the pilot study and confirm the decisions as to placement of the questions according to difficulty. The test began with what turned out to be the easiest question, allowing students to become engaged in taking the test. The hardest question (string) was placed midway through the test and was followed by the second easiest question (balloons) so that students potential discouragement resulting from a difficult question could be overcome by an item on which they could be successful. The pencil and calculator questions had the highest dispersion of scores, and the flowerpot, calories, balloons, and Ms. Hernandez questions, ranking 1, 2, 3 and 5 in order of difficulty, had the smallest standard deviations. Analysis by group. Tables showing the mean and standard deviations of students scores for each question, organized according to group, are provided in Appendix N. A look at the mean and standard deviations of boys and girls scores on individual questions yielded some interesting findings. Boys outscored girls on eight of the twelve questions. The questions for which girls mean score was higher than boys were the flowerpot, balloon, pizza, and lunch questions, whose respective ranks in order of difficulty according to overall means were 1, 2, 6, and 7. A rank of one indicates that the question garnered the highest mean score of all questions and was therefore the easiest question on the test for the study participants. Generally speaking, this result signifies that girls outscored 201

PAGE 217

boys on some of the easiest questions, and boys outscored girls on all of the hardest questions. There did not appear to be a relationship between how well boys and girls performed on individual items based on the gendered wording. The four items for which girls scored higher than boys had a variety of classifications of gendered wording; two had male wording, one was neutrally worded, and the wording of the other included both male and female wording. The standard deviation of girls scores was higher than that of boys scores for eight of the twelve questions and the same as boys for two items. Boys standard deviations were higher than girls for the flowerpot and string questions, ranked 1 and 12 respectively in order of difficulty. In other words, there was more variability in the boys scores than the girls scores for only the easiest and most difficult questions on the test. A comparison of the mean scores of high, middle and low ability mathematics students reveals that for each group, the rank order of the means for individual questions did not vary significantly from the rank order of the means for the entire sample. The rank order of the scores for the middle and low ability groups varied by no more than one place from the ranking found in the overall sample. For the high ability group, the rank order of all items was within one place of the order for the overall sample except for the school lunch and pencil problems whose order differed by two places. When the students were grouped according to reading ability, the rank order of the scores for all three groups differed by no more than one place for all questions. 202

PAGE 218

Descriptive Analysis of Total Mathematics Performance Scores Table 18 shows the mean and standard deviation of students total mathematics performance scores for each level of gender, mathematics ability, and reading ability. The figures indicate that boys as a group scored 1.19 total points higher on the test than girls. The standard deviation of boys scores was slightly lower than that of girls. When viewing the mathematics scores through the lens of mathematics ability, predictably, students of higher ability scored Table 18 Total Mathematics Performance Scores by Group Group n M SD By Gender Boys 129 27.30 7.12 Girls 108 26.11 7.37 By Mathematics Ability High 57 34.19 5.27 Middle 129 26.11 5.64 Low 51 20.10 5.06 By Reading Ability High 73 32.44 5.97 Middle 134 25.24 5.91 Low 30 19.73 5.92 Note. N=237, max performance score = 48 points. 203

PAGE 219

higher. The mean difference between the high and middle groups (8.08) exceeded the difference between the middle and low groups (6.01). The standard deviations of the scores for all three mathematics ability groups were similar with a range of 5.06 to 5.64 points. When the sample was grouped according to reading ability, similar results were found. The students of higher reading ability received higher total mathematics performance scores than students of lower reading ability. The difference between the high and middle groups was 7.20 points, and the difference between the middle and low groups was 5.51 points. The dispersion of scores within the high, middle, and low ability groups was nearly identical. The box and whisker plot shown in Figure 1 illustrates the total performance scores for the study sample organized according to three different grouping variables: gender, mathematics ability, and reading ability. For the grouping variable of gender, the boxplot illustrates the wider range of scores for girls, but shows that the girls H-spread (Glass & Hopkins, 1996), the distance between the first and third quartiles, is smaller than boys. Because the H-spread defines the middle-scoring half of the group, it can be concluded that the middle half of all girls scored within a smaller range than the middle half of all boys. The boxplots for the grouping variable of mathematics ability illustrate that the sample students total scores behaved in a predictable fashion. The maximum, quartile 3, median, quartile 1, and minimum scores were all patterned according to ability, with the high ability group having the highest scores, the middle ability group having the second highest scores, and the low ability group having the lowest 204

PAGE 220

scores. The H-spread for the middle ability students was slightly larger than for the high and low ability students. The entire H-spread of the high-ability group was above that of the middle-ability group. The boxplots for the grouping variable of reading ability illustrate the same pattern as that found for mathematics ability with the students of higher ability having higher scores for all quartiles. 05101520253035404550BoysGirlsH MathM MathL MathH RdgM RdgL RdgGroupTotal Performance Score Q1 min median max Q3 Figure 1. Box and whisker plot of total performance scores grouped by gender, mathematics ability, and reading ability. Correlations Between Mathematics Performance and Other Variables As described in the methods chapter, the teachers whose students participated in the study were asked to rate how well prepared their students were to answer each of the questions. Therefore, the teachers were rating each 205

PAGE 221

206 test items difficulty relati ve to a generalized notion of their students abilities. The ratings were recorded using a five -face likert scale, and the teachers responses were converted to numerical scores ranging from 4 (very well prepared) to 0 (not well prepared). Table 19 shows the teacher ratings of their students preparedness for each of the twelve test items. The average of the teachers responses was correlated wit h the mean score for each question with a resulting Pearson correlation coefficient of 0.89. This indicates a strong positive relationship between teachers predictions of their students performance and their actual performance. When the t eachers scores were partitioned according to the gender of the teacher the results showed a stronger positive correlation between the five male teachers responses ( r = 0.90) than the nine female teachers responses ( r = 0.74). Additional correlations were computed to determine the strength of the relationshi p between the teachers assessments of their students preparedness for each school. The correlation coefficients were 0.85, 0.72, and 0.89 for schools one, two, and three, respectively. These results indicate that the teachers at school th ree were the most accurate in their assessment of how well prepar ed their students were to be successful on the test items. Students mean mathematics performanc e scores were also correlated with the reading level of each question. The reading level was determined using the Homan-Hewitt Readability Formula (Homan, Hewitt, & Linder, 1994) whose process is described in chapter three. The resulting Pearson correlation coefficient of 0.33 indicates a weak positive relationship.

PAGE 222

Table 19 Teacher Rating of Student Preparedness by Question School Teacher Gender Flowerpot Ms. Hernandez Calories Jeans Class Pizza String 1 A M 4 3 4 4 2 1 1 B M 4 4 3 4 3 0 1 C M 4 4 4 4 2 3 1 D F 4 4 4 4 3 1 2 A F 4 4 4 3 2 1 2 B F 3 2 3 3 4 3 2 C F 4 3 4 4 1 2 2 D M 4 4 4 3 3 2 2 E F 4 3 2 4 2 0 2 F M 4 3 4 4 1 1 3 A F 3 2 4 3 3 2 3 B F 4 4 4 2 1 0 3 C F 4 4 3 4 2 1 3 D F 4 3 3 3 1 0 3 E F 4 4 4 4 2 0 3 F F 4 3 4 3 2 1 Note. Teacher ratings range from 0 to 4 points. A score of four indicates the highest level of student preparedness. 207

PAGE 223

2 F M 3 1 . . 3 C F 4 2 2 2 1 2 3 D F 3 2 3 1 2 3 3 F F 2 2 3 3 2 1 2 E F 4 2 2 1 1 3 3 A F 4 2 3 3 2 2 3 B F 3 2 3 2 1 2 3 E F 2 2 3 3 0 1 Table 19 (continued) Teacher Rating of Student Preparedness by Question School Teacher Gender Balloons Flour Calculator Lunch Pencils Birdseed 1 A M 4 1 4 4 2 2 1 B M 4 0 2 3 1 2 1 D F 4 3 3 3 1 1 2 C F 4 0 4 2 3 4 1 C M 4 4 2 4 2 3 2 D M 3 4 2 2 3 2 2 A F 3 3 3 3 3 2 2 B F 1 2 3 3 3 3 Note. Teacher ratings range from 0 to 4 points. A score of four indicates the highest level of preparedness. Missing data indicated by . 208

PAGE 224

The amount of time each student used to take the test was noted as each student submitted his/her finished work. The mean elapsed time for the sample was 39.41 minutes with a standard deviation of 12 minutes. Girls mean time was almost exactly two minutes longer than boys, but the dispersion of boys elapsed time was greater. Elapsed time was correlated with students total mathematics performance scores to determine if there was a direct relationship between the two. The Pearson correlation coefficient was 0.20, indicating a weak positive relationship. The final correlations studied related students mathematics performance scores to the accuracy of their self-efficacy and self-assessment. The Pearson correlation coefficient for self-efficacy mean bias was -0.47, indicating a fairly strong negative relationship. This can be interpreted to mean that higher performance scores were indicative of lower self-efficacy mean bias scores; the students who performed better on the test were more accurate in their feelings of self-efficacy. The Pearson correlation coefficient for self-assessment mean bias was -0.43, also indicating a moderately strong negative relationship. Students who achieved higher scores on the test were more accurate assessors of their own work than students with lower scores. Inferential Statistics Factorial ANOVA Results In order to suggest that differences in gender, mathematics ability, and reading ability would result in differences in mathematical performance scores in the population, chance must be ruled out as a plausible explanation for the 209

PAGE 225

observed differences in the sample. To assess the tenability of a chance explanation, a 2x3x3 factorial ANOVA was conducted with an alpha level set at .05 for each effect. The degree to which the Type I error rates were actually controlled to the specified alpha level depended on how adequately the data met the assumptions of independence, normality, and equal variances. The assumption of independence was met by ensuring that different observations came from different individuals. The study test was administered individually and each student completed her/his work alone. The descriptive statistics indicate that the assumption of normality was not violated. To assess the assumption of equal variances, Levenes test for homogeneity of variances was utilized. This procedure tests the null hypothesis that the group variances of the group means are equal. The results of the Levenes test at F(12, 224)=.76, p=0.6954, indicated that the null hypothesis cannot be rejected. As a result, the variances of the group means were not found to be heterogeneous, leading to the conclusion that the assumption of equal variances was not violated. According to this analysis of the assumptions, it appeared reasonable to conduct the factorial ANOVA. The cell sizes for this study were not equal. If the disparity in cell sizes was not deemed to be the result of differences in the population, the regression approach (Type III) to analyzing the sums of squares would have been in order. However, it was expected that in accordance with the population, most of the sample would collect across the diagonal that represented identical categorization for both mathematics and reading ability. It was also expected 210

PAGE 226

that instances of a student being classified as high in one subject area and low in the other would be rare. Because the sample cell distribution was judged to be indicative of the distribution in the population, either the Type I (hierarchical) or Type II (experimental) method of computing sums of squares was appropriate. The hierarchical method is reserved for situations in which there is an established order for the effects and was judged to be the most appropriate for answering this question. For the hierarchical method, each effect is adjusted only for those preceding it in the order. For this question, gender was placed first in the ordering because it can be argued that gender may affect mathematics ability and reading ability, but not vice versa. Mathematics ability was placed second as it is more closely tied to the outcome variable of mathematics performance than reading ability (Ethington, 1992; Fierros, 1999). The obtained F(12,224)=18.18, p<.0001 for the overall F-test indicates that one or more of the independent variables was significantly related to the mathematics performance scores. The R-square value of 0.4933 shows that nearly 50% of the variance in the scores can be explained by the independent variables. The results of the main and interaction effects for the three-way factorial ANOVA are presented in Table 20. The factorial ANOVA results indicate that the main effect for mathematics ability at F(2,236)=96.39, p<.0001 is statistically significant using a predetermined Type I error rate of .05. Omega square, a conservative population estimate of the proportion of variance in the dependent variable associated with 211

PAGE 227

the independent variable (Stevens, 1999), shows that an estimated 43% of the variance in population means for mathematical performance was related to Table 20 Factorial ANOVA Results for Total Mathematics Performance Scores Source df MS F p 2 f Gender 1 83.42 2.98** 0.0857 0.00 0.11 Mathematics 2 2698.39 96.39 <.0001 0.43 0.89 Gender*Math 2 11.95 0.43 0.6530 0.00 0.06 Reading 2 150.71 5.38 0.0052 0.02 0.21 Gender*Reading 2 122.17 4.36 0.0138 0.02 0.19 Math*Reading 3 7.37 0.26 0.7688 -0.01 0.06 Gender*Math*Reading 1 41.92 1.50 0.2223 0.00 0.08 Note. N=237, 2 = omega squared, f=Cohens f measure of effect size. *p<.05. **p<.10. mathematics ability. The Tukey test can be used as a post hoc procedure to determine where the significant differences lie while maintaining the overall alpha rate at .05. The Tukey test for mathematics ability indicated that the differences in means between the three ability groups all differed by a significant amount. This result was expected, and is not of particular interest for further discussion. Of greater interest was the result that reading ability was found to be statistically significant with F(2, 236)=5.38, p=.0052. This finding indicates that if the null hypothesis was true, that there was no relationship between reading 212

PAGE 228

ability and mathematical performance on the problem-solving test, the probability of obtaining an F as large or larger than the one obtained would be .0052. Because this probability is so small (less than .05), the null hypothesis is rejected in favor of an alternative hypothesis that suggests that at least one pair of population group means differ. Cohens f effect size measures how far the group mean typically deviates from the grand mean in standard deviation units. The guidelines for evaluating Cohens effect size are that a value of 0.1 indicates a small effect, 0.25 indicates a medium effect, and 0.4 represents a large effect (Stevens, 1999). The Cohens f value of 0.21 for reading ability indicates a medium effect size, and the omega squared value of .02 indicates that 2% of the population variance in mathematics performance is related to reading ability. The follow-up Tukey test indicated that the group means between all three ability levels differed by a significant amount. The results also showed that the interaction effect for gender and reading ability was significant at F(2,236)=4.36, p=0.0138. This means that the difference in observed group means between boys and girls of high ability is enough different than the difference in group means between boys and girls of middle or low ability to conclude that the differences must exist in the population. To better understand this interaction, a graph showing the group means is provided in Figure 2. As a result of these findings, the null hypothesis that the interaction of gender and reading ability does not affect students mathematics performance is rejected. The graph shows that girls of high reading ability outscored boys of high reading ability by an average of nearly two points. The 213

PAGE 229

reverse is true for boys and girls of middle reading ability, with boys outscoring girls by slightly over two points. For boys and girls of low reading ability, there was only a tenth of a point difference in average scores in favor of girls. This means that possessing high reading ability gives girls more of an advantage for this type of mathematical problem solving than it does boys. 0510152025303540HighMiddleLowReading AbilityMathematics Score Girls Boys Figure 2. Total mathematics score interaction of reading ability and gender. N=237. The F test for the interaction of reading ability and gender shows that the interaction is significant, but does not provide information as to where the differences between boys and girls performance scores were significant. Planned contrasts are a means of determining whether the gender differences 214

PAGE 230

were significant at the high, middle, or low reading ability levels. The results show that the contrast differences between boys and girls of high reading ability was not significant F(1,236)=1.87, p=.1726, nor was the difference in group means of mathematics performance scores for boys and girls of low reading ability F(1,236)=0.00, p=.9506. The only contrast which reached significance was the one comparing boys and girls of middle reading ability F(1, 236)=4.74, p=.0305. The remainder of the null hypotheses must stand because the factorial ANOVA failed to find significant differences. The main effect of gender on mathematical performance in the sample was not significant at F(1, 236)=2.98, p=.0857. Therefore, no conclusions can be drawn about the relationship between gender and mathematics scores for the population. This does not necessarily mean that this variable does not moderate mathematics performance in the population, but that this study, as conducted, did not have the power to rule out chance as the cause of the differences. Multiple Regression Analysis Introduction. The sample for this analysis consisted of 237 students, 108 girls and 129 boys. The three main effect predictor variables considered for the analysis were gender (dummy coded with girl=1, boy=0), FCAT mathematics scale score (abbreviated as FCATM and ranging from 100-500) and FCAT reading scale score (abbreviated as FCATR and ranging from 100-500). Because one of the interaction effects was significant in the factorial ANOVA, the interactions of gender with reading ability and mathematics ability were also 215

PAGE 231

considered as predictors. The outcome variable for this analysis was students total mathematics performance score with a possible range of 0-48 and an abbreviation of TOTALM. The multiple regression test was first conducted using the three main effects and the two interaction effects. Neither interaction effect was found to be significant. Because the inclusion of interaction effects in multiple regression analyses makes interpretation of the regression coefficients and standardized regression coefficients exceedingly complex, and because the interactions were not significant, the multiple regression was run again without the interactions. Although the results and discussion presented in text are based on the model without interactions, a table presenting the results with the interactions is included in Appendix T. Descriptive statistics. The first step in the analysis was to examine the descriptive statistics for each continuous variable involved in the study. The mean of the FCAT mathematics scores was 331.09 with a standard deviation of 43.31. The distribution was not skewed (sk=0.04), but was noticeably leptokurtic (ku=1.22). The mean of the FCAT reading scores was slightly higher at 343.96 with a smaller standard deviation at 42.00. The distribution of scores was not skewed (sk=0.05), but was slightly leptokurtic (ku=0.97). Multiple regression results. A summary of the multiple regression results is presented in Table 21. Students FCAT mathematics and reading scores were both found to be significant predictors of their mathematics performance. Gender was not a significant contributor to the predictive utility of the regression 216

PAGE 232

equation, and neither of the interactions was significant. The obtained R 2 value for the model was .5358, indicating that about 54% of the variability in the mathematics performance scores was explained using the set of predictors. Table 21 Summary of Multiple Regression Analysis for Mathematics Performance Variable b SE t p Intercept -18.8046 0.0 2.87926 -6.53* <.0001 Gender -0.0562 -0.0039 0.6767 -0.08 0.9338 FCAT Mathematics Score 0.0798 0.4775 0.0103 7.72* <.0001 FCAT Reading Score 0.0557 0.3230 0.0105 5.31* <.0001 Note. N=237. b=regression coefficient, =standardized regression coefficient, SE=standard error, * indicates significance at =.05 level. Because R 2 tends to be an overly optimistic estimate of population variability, the adjusted R 2 value is used as a more conservative alternative. The adjusted R 2 for this model was 0.5299, indicating very little shrinkage. Cohens effect size f 2 = R 2 /(1-R 2 ) was computed to be 1.1542 which is interpreted as a very large effect size using Cohens rough guidelines (.02 small, .15 medium, .35 large). These results indicate that this combination of variables serves as a good predictor of total mathematics performance scores. The standard error of estimate was 4.96557 which means that the prediction equation, on average, will 217

PAGE 233

produce a predicted total mathematics performance score that errs by about 5 points. The obtained prediction equation was: TOTALM = -18.8046 + -0.0562*GENDER + 0.0798*FCATM + 0.0557*FCATR To better interpret the equation, consider two students of the same gender who have identical FCAT reading scores. If one student had a 100-point higher FCAT mathematics score, he/she would be predicted to have a total performance score that was a little less than eight points higher. Similarly, if two students of the same gender had identical FCAT mathematics scores, but one had a 100-point higher FCAT reading score, he/she would be predicted to have a total mathematics performance score that was almost six points higher. The standardized regression coefficient of 0.4775 for FCATM indicates that a 1 standard deviation increase in a students FCAT mathematics score would lead to a predicted 0.4775 standard deviation increase in her/his total mathematics performance score. The standardized regression coefficient of 0.3230 for FCATR means that a 1 standard deviation increase in a students FCAT reading score would result in a predicted 0.3230 standard deviation increase in her/his total mathematics performance score. The standardized regression coefficient of -0.0562 for gender indicates that the predicted total performance score for girls would be 0.0562 standard deviations lower than boys. Squared semi-partial correlations for each predictor were also examined. FCATM was found to account uniquely for 11.9% of the variability in the 218

PAGE 234

performance scores, whereas FCATR accounted for only 5.6%, and gender accounted for only 0.001%. Assumptions. Regression analyses are based on a host of assumptions. The data were screened and an assessment of each assumption was made. The assumption of homoscedasticity of errors says that the variance of the errors stays the same as the predictions move along the regression line. In other words, the assumption is that the prediction errors created when the regression equation predicts low outcome values for students are similar to those found when high outcome values are predicted. To determine whether this assumption was violated, the residuals were plotted with the predicted values. The resulting plot indicated that this assumption was not violated. The residuals were also found to be normally distributed with skewness of -0.02, and kurtosis of -0.53. A further examination of the residual plot indicated a linear relationship between the variables, and nothing in the design of the study indicates that the residuals are not independent. The removal of the three outliers prior to running the factorial ANOVAs resulted in no additional outliers being identified through a review of students Cooks D values. Due to the stringent testing conditions under which students FCAT scores were obtained, the assumption that the predictors were measured without error is satisfied. Although the predictors cannot be considered fixed, multiple regression is considered robust to violations of this assumption. In conclusion, based on the screening of the data, it appears that multiple regression was an appropriate procedure for analyzing the data for this question. 219

PAGE 235

Conclusion. The fact that students FCAT mathematics scores were significant predictors of their mathematics performance on the study test is not surprising. The results for FCAT reading scores were also significant, and lend further support to previous research findings regarding the relationship between reading ability and problem-solving performance. Although gender did not contribute significantly to the ability of the regression equation to predict students total performance, because the data was easily obtained, there is no harm in including gender in the equation. Question Three: Findings for Linguistic Explanations The findings in this section address the following research question: To what extent is the quality of students linguistic explanations of their problem-solving processes, as measured by a holistic rubric, related to gender, reading ability as measured by teacher ratings and FCAT reading scores, and mathematics ability as measured by teacher ratings and FCAT mathematics scores? The data for the dependent variable for this question, linguistic explanations, were obtained through administration of the study test instrument. The test consisted of twelve constructed-response mathematical problem-solving items. The linguistic explanations students provided to explain their problem solving processes were scored according to a three-point rubric (Appendix B). The sum of the scores for all twelve items, ranging from 0-36, was used as the dependent variable for this question. 220

PAGE 236

Descriptive Statistics Descriptive Analysis by Question Analysis of overall sample. Table 22 shows the mean, standard deviation, skewness, and kurtosis for students total linguistic explanation rubric scores with Table 22 Descriptive Statistics for Linguistic Explanation Scores Question M SD Skewness Kurtosis Flowerpots 1.37 1.11 0.13 -1.35 Ms. Hernandez 1.30 0.96 -0.14 -1.21 Calories 1.29 0.99 0.13 -1.07 Jeans Class 1.32 1.00 0.12 -1.08 Pizza 1.91 0.79 -0.55 0.11 String 0.67 0.74 0.74 -0.39 Balloons 1.21 1.12 0.26 -1.36 Flour 0.91 0.88 0.36 -1.18 Calculator 1.30 0.95 0.04 -1.02 School Lunch 0.84 0.89 0.54 -1.02 Pencils 1.31 1.02 0.15 -1.14 Birdseed 0.78 0.76 0.52 -0.70 Total 14.20 7.32 0.09 -0.97 Note. N=237, max. score for individual questions = 3 points, max. total score = 36 points. 221

PAGE 237

a total possible score of 36, as well as for each of the twelve questions with a possible score of 3 points. The data for students total linguistic explanation scores yielded a mean of 14.20 and a standard deviation of 7.32. The skewness value of 0.09 is nearly normal, but the kurtosis value of -.96 indicates a noticeably platykurtic distribution. This indicates a flat or broad curve and coincides with the large size of the standard deviation relative to the mean. Although the kurtosis value is high, factorial ANOVA is believed to be relatively robust to violations of this assumption (Glass, Peckham, & Sanders, 1972). Most of the distributions for the scores for individual questions were also platykurtic, but not skewed. Because the total linguistic explanation score distribution is not skewed and not extremely leptokurtic, and because of the relatively large sample size, factorial ANOVA is believed to be fairly robust in this situation. The mean of 14.20 points represents an average score of just under 40%, compared to an average score for mathematics performance of 56%. These low scores should not be interpreted as meaning that students failed the test on both measures. Holistic rubric scores are not an interval measure and are not intended to be converted to percentage scores. However, the percentage scores do provide a means of comparing student performance on multiple tasks that are scored with a holistic rubric. The percentages can be used to conclude that, overall, students did much better with the mathematics than they did with the linguistic explanations. 222

PAGE 238

The pizza question garnered the most complete linguistic explanations of any question, scoring more than a half-point higher on average than any other question. Given that this problem did not call for computation, but for students to make sense of the fact that the size of a fractional part is dependent on the size of the whole, this result is not surprising. The string and birdseed problems, by far the most difficult items on the test, resulted in the lowest mean scores for linguistic explanations. In many instances, students were not able to make an informed attempt at solving these two problems, therefore it follows that they would be ill-equipped to explain their problem-solving processes. Analysis by group. Descriptive statistics of students by-group performance on each question are provided in Appendix O. A descriptive analysis of the by-question data for boys and girls reveals that girls achieved higher scores for their linguistic explanations on nine of the twelve test items. Boys linguistic explanations received higher scores for the calories, string, and pencil questions, ranked 3, 12, and 9 in order of difficulty, with 12 being the most difficult. The standard deviations of girls scores were higher than boys for all questions except the calorie and pizza items. All three mathematics ability groups had the highest linguistic explanation scores for the pizza question, and the lowest scores for the string question. For the questions with an overall difficulty ranking of 4 th -9 th the rank order of the linguistic explanation scores varied widely by mathematics ability level. The same is true for the sample when it was grouped according to reading ability. 223

PAGE 239

Descriptive Analysis of Total Linguistic Explanation Scores Table 23 shows the mean and standard deviation of students total linguistic performance scores grouped by gender, reading ability, and mathematics ability. The data show that girls total linguistic explanation scores were slightly less than one point higher than boys, and slightly more dispersed than boys. Students classified with higher mathematics ability scored higher than students of lower ability, and the standard deviations of the scores were Table 23 Total Linguistic Explanation Scores by Group Group n M SD By Gender Boys 129 13.82 7.16 Girls 108 14.66 7.51 By Mathematics Ability High 57 18.75 7.53 Middle 129 13.41 6.81 Low 51 11.12 5.97 By Reading Ability High 73 17.58 7.62 Middle 134 13.30 6.68 Low 30 10.03 6.06 Note. N=237, max. score = 36 points. 224

PAGE 240

ordered according to ability with the highest ability group having the highest standard deviation, and the lowest ability group having the lowest standard deviation. Similarly, students of higher reading ability received higher linguistic explanation scores than students of lower reading ability. The scores in the low reading ability had the least amount of dispersion. Figure 3 presents the box and whisker plots for linguistic explanation scores grouped by the three different independent variables. The plots for boys and girls linguistic explanation scores are similar, with a larger H-spread and a higher median score for girls. When grouped by mathematics ability, the plots reveal that the H-spread of the low ability group is smaller than that of the other two groups indicating that the middle half of the low ability group had a smaller 05101520253035BoysGirlsH MathM MathL MathH RdgM RdgL RdgGroupTotal Linguistic Explanation Scores Q1 min median max Q3 Figure 3. Box and whisker plot of total linguistic explanation scores grouped by gender, mathematics ability, and reading ability. 225

PAGE 241

range of scores than that of the other two groups. The maximum, quartile 3, and median scores followed a pattern whereby the higher ability groups had higher scores. Both the middle and low mathematics ability groups had minimum scores of zero and similar quartile 1 scores. The three plots for the three levels of reading ability follow a pattern similar to that formed by using mathematics ability as the grouping variable, except that the minimum and first quartile marks also follow a pattern according to ability. Inferential Statistics Factorial ANOVA Results In order to suggest that differences in gender, mathematics ability, and reading ability would result in differences in the level of students linguistic explanations in the population, chance must be ruled out as a plausible explanation for the observed differences in the sample. To rule out the likelihood of a chance explanation, a 2x3x3 factorial ANOVA was conducted with an alpha level set at .05 for each effect. The degree to which the Type I error rates are actually controlled to the specified alpha level depends on how adequately the data meet the assumptions of independence, normality, and equal variances. The assumption of independence was met by ensuring that different observations came from different individuals. The study test was administered individually and each student completed her/his work alone. The descriptive statistics indicate that the distribution of scores was noticeably leptokurtic, but due to the large sample size, factorial ANOVA is believed to be relatively robust to the violation of this assumption. To assess the assumption of equal variances, 226

PAGE 242

the Levenes test for homogeneity of variances was conducted with the result of F(12, 224)=1.30, p=0.2173. The p-value signifies that the null hypothesis of homogeneous variances cannot be rejected, thereby indicating that the assumption of equal variances was not violated. Based on this analysis of the assumptions, it appeared reasonable to conduct the factorial ANOVA for linguistic explanations. The cell sizes for this study were not equal. If the disparity in cell sizes was not deemed to be the result of differences in the population, the regression approach (Type III) to analyzing the sums of squares would have been in order. However, in accordance with the population, it was expected that most of the sample would collect across the diagonal that represented identical categorization for both mathematics and reading ability. It was also expected that instances of a student being classified as high in one subject area and low in the other would be rare. Because the sample cell distribution was judged to be indicative of the distribution in the population, either the Type I (hierarchical) or Type II (experimental) method of computing sums of squares was deemed appropriate. The hierarchical method is reserved for situations in which there is an established order for the effects as is the case for this question. For the hierarchical method, each effect is adjusted only for those preceding it in the order. For this question, gender, as a demographic variable, was placed first in the ordering because although gender may affect students mathematics and reading abilities, the reverse argument is not viable. Reading ability was placed second as it is more closely tied to the outcome variable of linguistic explanations 227

PAGE 243

than is mathematics ability. Reading and writing are two facets of literacy that are inextricably linked, and a students ability in one area is directly related to her/his ability in another (Routman, 1994; Spivey & King, 1994). Because the relationship between mathematics ability and linguistic explanations of problem solving has not been clearly established in the literature, this variable was placed third in the ordering. The obtained F(12,224)=4.02, p<.0001 for the overall F-test indicates that one or more of the independent variables is significantly related to the mathematics performance scores. The R-square value of 0.1772 indicates that about 18% of the variance in the scores can be explained by the independent variables. The results of the main and interaction effects for the three-way factorial ANOVA are presented in Table 24. The results of the factorial ANOVA for linguistic explanations indicates that both reading ability and mathematics ability are statistically significant with F(2,236)=16.14, p=<.0001 for reading ability, and F(2, 236)=6.16, p=.0025 for mathematics ability. As a result, the null hypotheses that reading ability and mathematics ability are not related to the quality of students linguistics explanations can be rejected. This signifies that the observed average difference between the linguistic explanation scores of high, middle, and low reading ability groups and high, middle, and low mathematics groups is large enough to conclude that a difference exists in the population. None of the interaction effects were significant. The omega squared value for the main effect of reading ability shows that an estimated 11% of the variance in the population is associated with students 228

PAGE 244

reading ability level. The large Cohens f value of 0.37 indicates that the group means typically deviate from the grand mean by 0.37 standard deviation units, and represents a large effect size. The omega squared value for mathematics ability signifies that 4% of the variance in the linguistic scores of the population is estimated to be related to students mathematics ability. The Cohens f effect size estimate of 0.23 reveals that the means of students grouped by mathematics ability deviate from the grand mean by 0.23 standard deviation units. Although the interaction effect of gender*reading ability is shown to have a small effect size of 0.09 according to Cohens f, the differences in group means were not found to be statistically significant. The sample size may have resulted Table 24 Factorial ANOVA Results for Total Linguistic Explanation Scores Source df MS F p 2 f Gender 1 41.06 0.88 0.3479 0.00 0.06 Reading 2 749.01 16.14 <.0001 0.11 0.37 Gender*Reading 2 47.21 1.02 0.3633 0.00 0.09 Mathematics 2 285.97 6.16 0.0025 0.04 0.23 Gender*Math 2 8.06 0.17 0.8407 -0.01 0.04 Reading*Math 3 1.41 0.03 0.9701 -0.01 0.02 Gender*Reading*Math 1 14.59 0.31 0.5756 0.00 0.04 Note: N=237, 2 = omega squared, f=Cohens f measure of effect size. *p<.05. 229

PAGE 245

in the study being underpowered and unable to rule out chance as the cause of the differences that do exist. The F test serves to indicate whether an effect is significant, but it does not tell whether all differences among the means of all levels of the variable are significant. Because the main effect of reading ability was significant but the interaction effect was not significant, the Tukey test was used in the post-hoc analysis. This test helps clarify the practical significance of the relationship between reading ability and the quality of students linguistic explanations as found in the F-test, while maintaining the overall Type I error level at .05. The results of the follow-up test indicated that the difference between the means of all ability groups was significant. On average, high reading ability students earned total linguistic explanation scores that were 4.28 points higher than middle ability students. In turn, middle ability students scores were an average of 3.27 points higher than those of low ability students. These results indicate that students of high reading ability provided significantly more complete and accurate linguistic explanations of their problem solving processes than did students of middle or low reading ability, and the middle ability students explanations were significantly more complete and accurate than those of low ability students. The results of the Tukey follow-up tests for mathematics ability yielded results that were different than those found for reading ability. The difference in group means between the high and middle mathematics ability groups (5.34) was significant as was the difference in means between the high and low ability groups (7.64). Therefore, the observed mean differences in the linguistic 230

PAGE 246

explanation scores between the high and middle and high and low ability groups were large enough to conclude that these differences truly exist in the population. The difference in means between the middle and low ability groups (2.29) was too small to reach significance. Multiple Regression Analysis Introduction. The sample for this analysis consisted of 108 girls and 129 boys for a total of 237 students. The three main effect predictor variables considered for the analysis were gender (dummy coded with girl=1 and boy=0), FCAT mathematics scale score (abbreviated as FCATM and ranging from 100-500), and FCAT reading scale score (abbreviated as FCATR and ranging from 100-500). The outcome variable for this analysis was students total linguistic explanation score with a range of 0-36 points and an abbreviation of TOTALLE. An analysis of the descriptive statistics for the predictor variables was conducted for question two, and because the same variables were used for this question, that analysis will not be repeated here. To more closely follow the format of the factorial ANOVA, the multiple regression was first run with the three main effects and two additional interaction effects (gender*FCATM and gender*FCATR). Because the interaction effects were not significant, and the inclusion of the interactions unduly complicates the interpretation of the results, they were excluded from the analysis. The following results and discussion are based solely on the three main effects, but a table summarizing the results of the model with the interaction effects is provided in Appendix T. 231

PAGE 247

Multiple regression results. A summary of the multiple regression results is presented in Table 25. Students FCAT mathematics and reading scores were found to contribute significantly to the predictive utility of the regression equation. The obtained R 2 value for the model was .1826, indicating that about 18% of the variability in students total linguistic explanation scores was explained using this set of predictors. Because R 2 tends to be an overly optimistic estimate of the variability in the population that would be accounted for by this set of predictors, the adjusted R 2 was examined. At a value of .1720, it showed a small amount of shrinkage. Table 25 Summary of Multiple Regression Analysis for Linguistic Explanations Variable b SE t p Intercept -13.4190 0.0 3.8604 -3.48* 0.0006 Gender 1.2113 0.0826 0.9073 1.34 0.1832 FCAT Mathematics Score 0.0323 0.1910 0.0139 2.33* 0.0209 FCAT Reading Score 0.0476 0.2735 0.0141 3.39* 0.0008 Note. N=237. b=regression coefficient, =standardized regression coefficient, SE=standard error, * indicates significance at =.05 level. Although gender did not provide a significant contribution to the prediction equation, because the data was easily obtained, and would be gathered if a 232

PAGE 248

study of this type was repeated, there is no disadvantage to leaving the variable in the equation. Cohens effect size f 2 = R 2 /(1-R 2 ) was computed to be 0.2234, a medium to large effect size. The standard error of measure was 6.6577, indicating that on average, the prediction equation will yield a predicted total linguistic explanation score that errs by a little more than 6.6 points. The obtained prediction equation was: TOTALLE = -13.4190 +1.2113*GENDER + 0.0323*FCATM + 0.0476*FCATR To interpret the outcome of the regression equation, consider two students of like gender who have identical FCATR scores. A 100-point increase in the FCATM score would result in a 3.23 higher predicted linguistic explanation score. Similarly, for two students of the same gender with identical FCATM scores, a 100-point increase in the FCATR score would result in a predicted increase in the linguistic explanation score of 4.76 points. Because the dummy coding for gender assigned girls the score of , and boys the score of , the prediction equation will yield a 1.21 point higher predicted TOTALLE score for girls, holding FCATM and FCATR constant. The standardized regression coefficient of 0.0826 for gender indicates that girls predicted TOTALLE score will be a small 0.0826 standard deviations higher than boys. The standardized regression coefficient for FCATM of 0.1910 means that a 1 standard deviation increase in FCATM would result in a predicted 0.1910 standard deviation increase in TOTALLE. The coefficient of 0.2735 for FCATR indicates that a 1 standard deviation increase in FCATR would result in a predicted 0.2735 standard deviation increase in TOTALLE. Standardizing the 233

PAGE 249

coefficients allows for a direct comparison of the strength of the relationship of the predictors on the outcome variable, and shows that FCATR was more influential on TOTALLE than FCATM. The squared semi-partial regression coefficients for each predictor reveal that gender uniquely accounted for only 0.6% of the variability in TOTALLE, whereas FCATM and FCATR accounted for 1.9% and 4.0% respectively. Assumptions. Regression analyses are based on many assumptions that should not be violated. The data were screened and an evaluation of each assumption was conducted. To assess the assumption of homoscedasticity of errors, the residuals were plotted with the predicted values. The resulting plot indicated that the assumption was not violated. The residuals were also found to be relatively normal with skewness and kurtosis values of -0.14 and -0.92, respectively. A further analysis of the residual plot shows a linear relationship between the values, and based on the design of the study, there is no reason to believe that the residuals are not independent. Because the outliers were removed based on the descriptive statistics prior to the analysis, the Cooks D values did not detect any additional outliers. Another assumption is that the predictors were measured without error. The variable of gender was self-reported, and no incidence of error has been detected. Due to the rigorous testing procedures association with the FCAT test, the measurement error associated with these variables is not deemed to be in violation of the assumption. The predictors of FCATM and FCATR cannot be considered fixed, but this is the case in many studies and multiple regression is considered robust 234

PAGE 250

to violations of this assumption. In conclusion, based on the analysis of the assumptions, multiple regression was an appropriate procedure for analyzing the data for this question. Conclusion. Students FCAT mathematics and reading scores were found to be significant predictors of the quality and completeness of students linguistic explanations of their problem-solving processes on the study test. Their inclusion in the regression equation significantly increased the equations ability to predict the outcome variable successfully. Although the variable of gender was not a significant contributor to the regression equations predictive ability, it did uniquely account for some of the variability in the outcome and will therefore remain in the model. If there was a significant cost or time factor in gathering this data, it would not be advisable to leave gender in the regression equation. However, because gender data is key to other elements of the study, and would be collected if this study were replicated, there is a small benefit, and certainly no harm in leaving it in the regression equation. The results of the regression analysis confirm the findings of the factorial ANOVA for mathematics ability and for reading ability, even though the variables of reading ability and mathematics ability were slightly different for the two analyses. For the factorial ANOVA, students ability level was a categorical variable determined through a combination of their FCAT level scores and the teachers ratings of their ability. For the multiple regression, students FCAT scale scores, a continuous variable, were used as the measure of their ability. Both analyses confirm that students reading ability and mathematics ability are 235

PAGE 251

significantly related to the quality of the linguistic explanations of their problem-solving processes. Reading ability was also shown to be more strongly related to the outcome variable than mathematics ability. Qualitative Analysis of Students Linguistic Explanations Every item on the study test ended with a statement similar to Show your work, then explain how you know you are correct. During the pre-test training discussions, students were told that the explanations of their mathematical problem-solving strategies and procedures were as important as their computational work. Whenever mathematical symbols and numerals were included in the text of the explanation, they were regarded as text for evaluation and scoring. Students linguistic explanations revealed a wide range of perceptions as to what it means to explain how you know you are correct. The purpose of this section is to use unedited samples of student work to illustrate the various approaches students took to providing explanations of their work. Some students were very conscientious about explaining every step of the problem-solving process and relating their explanations to the problem context. Each of the following examples received the highest possible score of 3 points for linguistic explanation. For each of the examples, the name of the problem precedes the students explanation and is followed by the childs gender. Ms. Hernandez Problem: There were two substitutes. I know Im right because if there are eight people in each team and 34 students you think of how many time eight goes into 34 without 236

PAGE 252

going over. Which is four times. Eight times four equals thirty-two. Plus two equals thirty-four. Thats two substitutes. Girl. Pencil Problem: I knew of twenty was 5 so I knew Brendan had 15 pencils so add them together and you get 20. Boy. Flour Problem: I got this answer by first multiplying 1 whole cup x 3 which equals 3 cups. Then I did 1/3 times 3 which equals one whole. 3+1=4. There where 4 cups needed to make 3 batches. Girl. Some students attempted to explain their work, but did not relate the explanation back to the context of the problem. A linguistic explanation that described all the computational steps but lacked context received a linguistic explanation score of 0-2 points. The first two examples received a score of two points, but the third example did not. This student attempted to explain what he had done, but his explanation used incorrect mathematical terminology and did not address the problem context. String Problem: It is right because 3/4 is equal to 6/8 and 1/8 goes into 6/8 6 times. Girl. Ms. Hernandez Problem: I divided 8 out of 34 and I got 4r 2 ! Boy. Ms. Hernandez Problem: I subtrated [sic] the factor of 34 and 8 and I got the sum of 26. Boy. Other examples of student work illustrate that some children were very confident about the work they had produced even though the work was incorrect. For both of the next examples, the students gave themselves the highest 237

PAGE 253

possible self-assessment score. The first sample contains some correct work, even though the work does not contribute to solving the problem. Making sense of the explanation proves to be as difficult a task as writing it must have been. The second example shows that the child completely misunderstood the question and believed it to be asking what is the difference in the number of pounds instead of what is the difference in the cost per pound. String Problem: I figured this in a difficult way. First, I found 3/4 of a yard is 2 1/4. I took the 1/4 and made it 2/8. I found that 8 1/8 was 1. Double that makes 16 1/8 which is 2. 16+2=18. Boy. Birdseed Problem: There 1 pound difference. Boy For some students, explaining how they know they are correct evokes a statement about their competence in the computational skill required to solve the problem. Statements such as I know my answer is correct because Im very good at mutulbling [sic], or I know its right because Im verry [sic] good with division and subtraction bear this out. Others explained their problem-solving strategies with generic statements such as I looked for keywords and got an idea of what I was supposed to do, or I know Im right I did the math. Some of the most interesting responses were those that provided a glimpse of a childs personality, such as his or her poetic nature, unusual food tastes, or notions about the inherent differences between boys and girls. Ms. Hernandez Problem: How I got this answer was I divided 8 and got to [sic] so then I decided to write it out for myself and you. Girl. 238

PAGE 254

Pizza Problem: Jose coule [sic] be right because if he has onions, broccoli and cheese but Ella has just cheese than [sic] Jose is right. Boy. Pizza Problem: One of there pizzas could be big or small. In this case I make Ellas small because she is a girl and girls some times dont eat as much as boys. So Jose could have eaten more then Ella. Girl. The last examples illustrate an unusual approach to explaining how you know your answer is correct. For these students, an overall feeling of confidence in their abilities, a positive attitude, or faith in a higher power shows them that they are correct in their thinking. Calculator Problem: My two ways are correct because I can belive [sic] in myself to get it right. Girl. Balloon Problem: I know I did this right because I had a really good feeling about this. Boy. Birdseed Problem: The power of math gives me the correct answer I know is right. But also the power of my brain that can work the problem. Girl. Question Four: Findings for Self-Efficacy The findings in this section address the following research question: To what extent is the relationship between students feelings of self-efficacy when commencing work on a mathematical problem, as measured by a self-reported 239

PAGE 255

rating scale, and their actual performance on a mathematical problem solving test as measured by a holistic scoring rubric, related to gender, reading ability as measured by teacher ratings and FCAT reading scores, and mathematics ability as measured by teacher ratings and FCAT mathematics scores? The data for this question, students self-efficacy mean bias scores, were obtained through administration of the study test instrument consisting of twelve constructed-response mathematical problem-solving items. Mean bias is a measure calibration; the accuracy of students self-efficacy feelings. The mean bias was calculated as the average difference between students self-efficacy self-rating and their actual performance. A positive mean bias indicates a situation wherein a students self-efficacy exceeds his/her actual performance on the test overall. A negative mean-bias score indicates that the students self-efficacy was, on average, lower than her/his actual performance on the test. To obtain the self-efficacy data, each item contained a self-reporting likert-type scale immediately following the question. The scale, formatted using a range of five faces from very happy to very sad, asked students to rate how well they thought they could solve the problem before they began working on the problem. After the test administration, the students responses were converted to numerical scores ranging from zero to four, with four indicating the highest level of self-efficacy. To obtain each students mean bias score, the average of all twelve mathematical performance scores was subtracted from the average of all twelve self-efficacy scores. The self-efficacy mean bias scores had a potential range of -4 to +4. 240

PAGE 256

Descriptive Statistics Descriptive Analysis of Total Self-Efficacy Although the variable of interest for this research question was self-efficacy mean bias, a clear picture of the data cannot be achieved without an understanding of students total self-efficacy scores. The first of the following three sections details how the students total self-efficacy scores varied according to the grouping variable. The second section explains how the total samples scores varied by test question. The third section looks at group data for each of the twelve questions. Overall analysis of groups. Table 26 shows the mean and standard deviation of students total self-efficacy organized by gender, by reading ability, and by mathematics ability. Boys total scores surpassed girls by 4.89 points, and the standard deviation of their scores was much lower at 6.70 as compared to 8.02 for girls. When mathematics ability was used as the grouping variable, the results show that the students of highest ability had the highest average total self efficacy scores followed by students of middle and low ability. The variability of the group scores was the highest for the low ability group and the lowest for the high ability group. When the sample was grouped according to reading ability, the mean scores decreased as student ability decreased. The variability of scores was the highest for the middle reading ability group. Analysis of overall sample by question. Although the data being analyzed inferentially in this question are students self-efficacy mean bias scores, a look at students actual self-efficacy ratings for each question provides information 241

PAGE 257

Table 26 Total Self-Efficacy Scores by Group Group n M SD By Gender Boys 129 37.95 6.70 Girls 108 33.06 8.02 By Mathematics Ability High 57 40.02 6.32 Middle 129 35.30 7.32 Low 51 32.00 7.93 By Reading Ability High 73 39.11 7.14 Middle 134 34.58 7.62 Low 30 32.60 6.79 Note. N=237, max. score = 48 points. that clarifies the overall self-efficacy picture. Table 27 shows the mean, standard deviation, skewness, and kurtosis for students total self-efficacy scores (possible range of 0-48) and students self-efficacy scores by question (possible range of 0-4). The data for students total self-efficacy scores yielded a mean of 35.73, a standard deviation of 7.71, and a relatively normal distribution with skewness and kurtosis values of -0.38 and -0.51 respectively. As expected, the distributions for 242

PAGE 258

Table 27 Descriptive Statistics for Self-Efficacy Scores by Question Question M SD Skewness Kurtosis Flowerpots 3.75 0.52 -1.99 3.09 Ms. Hernandez 3.23 1.00 -1.46 1.79 Calories 3.41 0.76 -1.14 0.68 Jeans Class 3.65 0.73 -2.23 4.89 Pizza 2.65 1.27 -0.63 -0.58 String 1.73 1.30 0.14 -1.08 Balloons 3.19 0.83 -0.98 1.04 Flour 2.96 1.13 -0.92 0.01 Calculator 2.67 1.39 -0.79 -0.65 School Lunch 3.15 1.05 -1.26 0.96 Pencils 2.54 1.42 -0.58 -0.97 Birdseed 2.79 1.27 -0.90 -0.17 Total 35.73 7.71 -0.38 -0.51 Note. N=237, max. score for individual questions = 4 points, max. total score = 48 points. individual questions for which students indicated relatively high feelings of self-efficacy were negatively skewed. Several of the distributions for individual questions were noticeably leptokurtic, a characteristic that is indicative of a spiked curve with many more extreme scores than would be found in a normal distribution. 243

PAGE 259

Students level of self-efficacy for individual questions was a fairly accurate indicator of their success on the problems. The items for which students felt the most and least self-efficacious were the flowerpot and string problems, respectively. These are also the items for which students actual scores were the highest and lowest. For six of the twelve items, the mean self-efficacy score was over 3.0, indicating that students felt fairly confident in their ability to be successful for half of the items. Analysis of groups by question. Descriptive statistics for students by-group scores on each question are presented in Appendix P. Boys feelings of self-efficacy exceeded that of girls for all twelve test items. There was more variability in girls self-efficacy scores for ten of the twelve items. The only exceptions were the balloon and string questions, the second easiest and the most difficult items, respectively. When the students were grouped by mathematics and reading ability, all groups were consistently the least self-efficacious about the string question. The question that showed the largest rank-order difference when mathematics ability was the grouping variable was the pizza item. When the questions were ranked according to level of self-efficacy for each group, with a rank of 1 being given to the question for which the students in that group had the highest level of self-efficacy, the high, middle and low ability students ranked the pizza question 10 th 3 rd and 9 th respectively. The question that showed the greatest rank order differences when reading ability was the grouping variable was the Ms. Hernandez item. High, middle, and low ability students ranked it 3 rd 5 th and 8 th respectively. 244

PAGE 260

Descriptive Analysis of Self-Efficacy Mean Bias Overall analysis of groups. Table 28 illustrates the relationships between students self-efficacy mean bias scores and their group membership. The bias score for boys, at 0.89, was over three-tenths of a point higher than that of girls. Table 28 Self-Efficacy Mean Bias Scores by Group Group n M SD By Gender Boys 129 0.89 0.60 Girls 108 0.58 0.64 By Mathematics Ability High 57 0.49 0.46 Middle 129 0.77 0.63 Low 51 0.99 0.73 By Reading Ability High 73 0.56 0.54 Middle 134 0.78 0.64 Low 30 1.07 0.71 Note. N=237, self-efficacy mean bias scores range from -4 to +4. The variability of boys and girls scores was nearly equal with standard deviations of 0.60 and 0.64. For this sample, although both boys and girls 245

PAGE 261

average mean bias scores were positive and indicated overconfidence in ability to complete the subject test items, girls were more accurate in judging their ability to complete the questions successfully. When the sample was organized according to mathematics ability, the students in the high ability group stood out as the most accurate assessors of self-efficacy with an average mean bias of 0.49. The low ability group rated their own self-efficacy an average of 0.99 points higher than their actual performance on each test item. The standard deviations of the groups were ranked according to ability level with the highest group having the least variability (SD=0.46), and the lowest group having the most variability (SD=0.73). When the grouping variable was changed to reading ability, students of high ability were shown to be much more accurate assessors of their own ability to complete a task than students of middle or low ability. The average mean bias score in the low reading ability group was 1.07, indicating that poor readers tended to rate their ability to complete a task successfully more than one point higher than their actual performance on that task. The standard deviations of the scores among the reading ability groups ranged from a low of 0.54 for the high group to a high of 0.71 for the low group. Although these patterns appear to be strong in the sample data, descriptive statistics alone cannot be used to infer that these differences are also present in the population. The box and whisker plots presented in Figure 4 illustrate that when the students were grouped by gender, the shape of the score distributions was similar. The H-spread was slightly larger for girls, but the first quartile 246

PAGE 262

-1-0.500.511.522.53BoysGirlsH MathM MathL MathH RdgM RdgL RdgGroupSelf-Efficacy Mean Bias Scores Q1 min median max Q3 Figure 4. Box and whisker plot of self-efficacy mean bias scores grouped by gender, mathematics ability, and reading ability. represented a wider range for boys. The shape of the plots for the grouping variable of mathematics ability was very different for the high ability group than for the other two levels. Although the H-spread of the three plots is similar, the whisker representing the upper quartile for the high ability group is only a fraction of the length of the other two plots. Even the most overconfident of the high ability students was not nearly as overconfident as some of the members of the middle and low ability groups. The fact that the distribution of scores for the high ability group is centered closer to zero means that these students were much more accurate in their feelings of self-efficacy than the lower ability students. Observation of the plots for the grouping variable of reading ability shows that the 247

PAGE 263

best readers were less likely to be extremely overor under-confident than students of middle or low ability. Analysis of overall sample by question. Table 29 shows the self-efficacy mean bias score averages for the entire test as well as for each individual test item. The mean bias was found by subtracting the mean mathematical performance score from the mean self-efficacy score. As a result, a positive bias score indicates students overconfidence in their ability to complete the item successfully, and a negative score shows that students were under-confident. As shown by the results, students were overconfident in their ability to complete all of the test items successfully. Students were the most accurate in their self-efficacy assessments for the balloons and pizza questions which ranked second and sixth, respectively, in difficulty according to the mathematical performance scores. The birdseed problem, ranked as the second most difficult problem, represented the most extreme level of overconfidence among all the test items. This item asked students to determine the difference in price per pound between two types of seeds. To solve, students had to divide two different bulk prices by the number of pounds to determine a price per pound, then find the difference between the two per-pound prices. The analysis of problem-solving processes determined that fully 43% of all students erroneously solved the problem simply by subtracting the two bulk prices to find the difference. If these students believed this to be the correct method, it would explain the extremely high level of overconfidence. 248

PAGE 264

Table 29 Self-Efficacy Mean Bias Scores by Question Question M SD Flowerpots 0.67 0.93 Ms. Hernandez 0.55 1.11 Calories 0.58 1.05 Jeans Class 0.96 1.11 Pizza 0.35 1.60 String 0.84 1.38 Balloons 0.28 1.16 Flour 0.92 1.19 Calculator 0.88 1.47 School Lunch 0.94 1.28 Pencils 0.68 1.41 Birdseed 1.33 1.54 Total 0.75 0.64 Note. N=237, self-efficacy mean bias scores range from -4 to +4. Analysis of groups by question. Appendix Q presents the mean and standard deviation of students scores for individual questions organized by group. A review of the means and standard deviations of boys and girls self-efficacy mean bias scores by question yields some interesting patterns. Boys average mean bias scores were higher than girls on all twelve of the study test 249

PAGE 265

items. The standard deviations of girls scores were higher for nine of the twelve items. Boys standard deviations of scores were higher for the flour, birdseed, and string questions, ranked 8 th, 11 th and 12 th in overall order of difficulty. These results indicate that boys were consistently more overconfident than girls across test items of varying difficulty and context, and that boys bias scores were more tightly gathered around the group mean. Although girls were more accurate in their judgments of self-efficacy for all questions, their higher deviations from the mean indicate less conformity within their group. Inferential Statistics Factorial ANOVA Results In order to suggest that differences in gender, mathematics ability, and reading ability would result in differences in students self-efficacy scores in the population, chance must be ruled out as a plausible explanation for the observed differences in the sample. To assess the possibility that the differences are merely a matter of chance, a 2x3x3 factorial ANOVA was conducted with the alpha level set at .05 for each effect. The degree to which the Type I error rates are actually controlled to the specified alpha level depends on how adequately the data meet the assumptions of independence, normality, and equal variances. The assumption of independence was met by ensuring that different observations came from different students. The study test was administered individually and each student completed her/his work alone. The descriptive statistics indicate that the data were distributed relatively normally, and the assumption of normality was not violated. To assess the assumption of equal 250

PAGE 266

variances, the Levenes test was conducted with a result of F(13, 226)=1.32, p=0.2091. The null hypothesis which states that there are no significant differences in the variances of any of the groups cannot be rejected, indicating that the variances were substantially equal. Based on this analysis of the assumptions, it appeared reasonable to conduct the factorial ANOVA. The cell sizes for this study were not equal. If the disparity in cell sizes was not deemed to be the result of differences in the population, the regression approach (Type III) to analyzing the sums of squares would have been appropriate. However, in accordance with the population, it was expected that most of the students in the sample would be categorized similarly for mathematics and reading, resulting in few students in the cells representing a high rating for one subject and a low rating for another. Because the sample cell distribution was judged to be indicative of the distribution in the population, and not the result of the methodology of this study, either the Type I (hierarchical) or Type II (experimental) method of computing sums of squares was in order. The hierarchical method is reserved for situations in which there is an established order for the effects. That was not the case for this question. It is clear that gender, as a demographic variable, would be placed first in the ordering because, although gender may affect students mathematics and reading abilities, the reverse argument is not viable. The issue is that the literature on the relationship between reading ability, mathematics ability, and self-efficacy is nearly nonexistent, resulting in the lack of a theoretical basis for ordering the 251

PAGE 267

variables. For this reason, the Type II experimental method for computing sums of squares was utilized for the factorial ANOVA for this question. The obtained F(12,224)=4.26, p<.0001 for the overall F-test indicates that one or more of the independent variables is significantly related to students self-efficacy mean bias scores. The R-square value of 0.1857 indicates that almost 19% of the variance in the scores can be explained by a combination of one or more of the independent variables. The results of the main and interaction effects for the three-way factorial ANOVA are presented in Table 30. The results of the factorial ANOVA for self-efficacy mean bias indicate that the main effects of gender F(1,236)=23.59, p=<.0001 and mathematics ability F(2,236)=5.13, p=0.0066 were both statistically significant. As a result, the null hypotheses that gender and mathematics ability were not related to students self-efficacy mean bias scores are both rejected. The F-test failed to reject all of the remaining null hypotheses for main and interaction effects. The omega squared values indicate that the combination of these two effects accounts for an estimated 11% of the variance in self-efficacy mean bias in the population. The Cohens f value for gender is medium-to-large at 0.31, and the Cohens f for mathematics ability, at 0.21, shows a medium-sized effect. Two of the interaction effects (gender*reading ability and reading*mathematics ability) showed small effect sizes according to the calculation of Cohens f. However, the F test did not determine that these differences were statistically significant. Whether or not the observed differences in self-efficacy mean bias were related to these interactions cannot be determined in this study. 252

PAGE 268

Table 30 Factorial ANOVA Results for Self-Efficacy Mean Bias Scores Source df MS F p 2 f Gender 1 8.23 23.59 <.0001 0.08 0.31 Reading 2 0.29 0.85 0.4309 0.00 0.08 Gender*Reading 2 0.54 1.55 0.2143 0.00 0.11 Mathematics 2 1.79 5.13 0.0066 0.03 0.21 Gender*Math 2 0.05 0.15 0.8640 -0.01 0.04 Reading*Math 3 0.47 1.36 0.2587 0.00 0.13 Gender*Reading*Math 1 0.24 0.68 0.4094 0.00 0.05 Note. N=237, 2 = omega squared, f=Cohens f measure of effect size. *p<.05. In the context of this study, there is no way to know whether an increased sample size, and the resulting increased power, would have resulted in these interactions proving to be statistically significant. Because two of the main effects were significant but the interaction effects were not significant, the Tukey follow-up procedure is an appropriate measure for determining whether the differences between all levels of mathematics ability were significant. As with the results for linguistic explanations, the follow-up test for self-efficacy indicates that the differences in the means between the high and low mathematics ability students (0.51) and between the high and middle ability students (0.28) were significant. The group means for self-efficacy mean bias of students of high, middle, and low mathematics ability were 0.49, 0.77, and 0.99, 253

PAGE 269

respectively. By combining the information gleaned from the Tukey test and the group means, the results indicate that students of high mathematics ability were significantly more accurate in their feelings of self-efficacy than were students of middle or low mathematics ability. Although students of all ability groups were overconfident in their feelings of self-efficacy, students of lower mathematics ability were more overconfident than students of high ability. Because the variable of gender only has two levels, a review of the means for boys and girls self-efficacy mean bias is the only follow-up to the F-test that is required. The average mean bias for boys was 0.89, indicating that, on average, boys rated their self-efficacy 0.89 points higher than their actual performance. In contrast, the average of girls self-efficacy mean bias of 0.58 indicates that, on average, girls rated their self-efficacy 0.58 points higher than their actual performance. These results show that both boys and girls were overconfident in their ability to complete the test items successfully, but boys were significantly more overconfident than girls. Although reading ability was related to students self-efficacy mean bias scores in the sample, the main effect for reading ability in the factorial ANOVA was not significant at F(2, 236)=0.85, p=.4309. The size of the sample may have been too small for the effect to reach significance. As a result, the null hypothesis that reading ability is not related to self-efficacy mean bias cannot be rejected. 254

PAGE 270

Multiple Regression Analysis Introduction. The predictors for this analysis of 237 students were gender, FCAT reading scores, and FCAT mathematics scores. Both of the FCAT scores range from 100-500, and the abbreviations used for this analysis were FCATR and FCATM for reading and mathematics respectively. The outcome variable was self-efficacy mean bias, abbreviated as SEMBIAS, and had a potential range of -4 to +4. Students mean bias scores were found by subtracting their average self-efficacy rating from their average mathematics performance score. A negative score indicates under-confidence, and a positive score shows overconfidence. The closer a mean bias score is to zero, the closer the childs self-efficacy ratings matched their actual performance. Because the prediction variables are the same for this analysis as for that of question one, the descriptive data is not repeated here. The multiple regression was initially run with the three main effects and two interaction effects, gender*FCATM and gender*FCATR. Because the interactions were not significant and do not serve to illuminate the relationships among the main effects and the outcome variable, they were not included in the final analysis or discussion. A table showing the results of the multiple regression with the interaction effects is provided in Appendix T. Multiple regression results. Table 31 presents a summary of the multiple regression analysis. All three variables, students gender, FCAT mathematics scores, and FCAT reading scores, contributed significantly to the ability of the regression equation to accurately predict students self-efficacy mean bias. The 255

PAGE 271

R 2 value for the model was 0.2068, indicating that nearly 21% of the variability in students self-efficacy mean bias was explained by the combination of the three predictors. Because R 2 tends to be an overly optimistic estimation of the models ability to predict the outcome variable in the population, the adjusted R 2 was observed. The adjusted value, at 0.1966, represents about a one percentage point decrease in the degree to which the regression equation is believed to be able to account for the variability of SEMBIAS in the population. Table 31 Summary of Multiple Regression Analysis for Self-Efficacy Mean Bias Variable b SE t p Intercept 3.0503 0.0 0.3315 9.20* <.0001 Gender -0.3381 -0.2646 0.0779 -4.34* <.0001 FCAT Mathematics Score -0.0026 -0.1746 0.0012 -2.16* 0.0319 FCAT Reading Score -0.0038 -0.2484 0.0012 -3.13* 0.0020 Note. N=237. b=regression coefficient, =standardized regression coefficient, SE=standard error, * indicates significance at =.05 level. Cohens effect size f 2 = R 2 /(1-R 2 ) was computed to be 0.2607, signifying a medium to large effect size. The standard error of estimate was 0.5717, which means that when the three predictors are used to predict a students self-efficacy mean bias, the resulting outcome will err an average of 0.5717 points. 256

PAGE 272

The obtained prediction equation was: SEMBIAS = 3.0503 + -.3381*Gender + -.0026*FCATM + -.0038*FCATR To interpret the equation, it is useful to examine the impact on the outcome when one variable is changed and the other two remain constant. For example, given one boy and one girl student with identical FCATM and FCATR scores, the girls predicted SEMBIAS score will be 0.3381 points lower than the boys. Considering that the intercept is a large positive number, the girls predicted SEMBIAS will be closer to zero, indicating a more accurate level of self-efficacy than the boys. If gender and FCATR are held constant, a 100-point increase in a students FCAT mathematics score will result in a 0.26 point decrease in the predicted self-efficacy mean bias score. This reduction represents an improvement in the level of accuracy of the childs self-efficacy ratings. If gender and FCATM are held constant, a 100-point increase in a students FCAT reading score will result in a predicted 0.38 point decrease in his/her SEMBIAS score. The standardized regression coefficients for the predictors provide a means of comparing their relative impact on the outcome variable by expressing the coefficients in terms of standard deviation units. The variable with the highest standardized coefficient was gender (-0.2646), meaning that other things being equal, boys SEMBIAS scores will be 0.2646 standard deviations higher than girls. The second highest coefficient was -0.2484 for FCATR, indicating that a 1 standard deviation increase in the FCAT reading score will result in a 257

PAGE 273

0.2484 standard deviation decrease in the students predicted SEMBIAS score, holding the other variables constant. This also represents an improvement in the accuracy of self-efficacy. The smallest coefficient was found for FCATM (-0.1746), and can be interpreted to mean that a 1 standard deviation increase in a students FCAT mathematics score will result in a predicted 0.1746 standard deviation decrease in a students SEMBIAS score. The squared semi-partial correlations were also examined to find each predictors unique ability to account for the variability in the outcome variable. The correlations were 0.06412 for gender, 0.01586 for FCATM, and 0.03328 for FCATR. These numbers mean that these variables uniquely accounted for 6.4%, 1.6%, and 3.3% of the variability in the SEMBIAS scores, respectively. Assumptions. The data were screened to ensure that none of the assumptions on which multiple regression analyses are based were violated. The assumption of homoscedasticity of errors says that the errors produced by the prediction equation should be no bigger, on average, at one end of the regression line than they are at the other. An examination of a plot of the residuals against the predicted values confirms that this assumption was not violated. The residuals were found to be normally distributed with very low skewness and kurtosis values (sk=-0.01, ku=0.08). Because the residual plot did not form a curve, the assumption of linearity was not violated. There was nothing in the design of the study that would lead to the conclusion that the residuals were not independent. A review of the Cooks D values did not lead to the removal of any outliers beyond those that were eliminated through the descriptive 258

PAGE 274

statistics analysis. The predictor variables are not believed to have a level of error that would violate that assumption. Finally, although the predictors may not be fixed, regression analysis is robust to violations of this assumption. In conclusion, the analysis of assumptions appears to have been an appropriate means of analyzing the data for this question. Conclusion. All three predictor variables were found to contribute significantly to the regression equations predictive ability, with gender being the variable of the highest contribution, followed by FCAT reading scores and FCAT mathematics scores, in that order. The findings related to gender and mathematics ability are not surprising as they confirm the results of the factorial ANOVA. The findings related to students FCAT reading scores were unexpected, and lead to further questions. Because the measure of reading ability for the factorial ANOVA was based on a combination of FCAT scores and teacher ratings, and because students were forced into three discrete levels of ability regardless of the range of their differences, the single continuous variable of FCAT reading scores may have proven to have a stronger relationship to SEMBIAS than the categorical variable of reading ability level. Qualitative Analysis of Students Rationale for Self-Efficacy Ratings The student interviews revealed some interesting observations about the source of some students self-efficacy. The verbal explanations of one female student indicated a stronger relationship between her confidence in her ability to solve the problem and her familiarity with the context of the problem, than with the mathematics involved in solving the problem. Several excerpts from her 259

PAGE 275

interview illustrate this point. When asked why she circled the happy face for self-efficacy for the pizza problem she explained: I mean, like, almost like all pizzas are different so like they could have ate at a different restaurant or one could have had more toppings than the other so thats how I kinda know that I might do really well on this one cause I eat pizza a lot. The flour problem involves calculating the amount of flour required for three batches of cookies, given the amount used for one batch. She explained her choice of the middle face for self-efficacy by saying: Well, one, one and one third cups of flour are needed in each batch of cookies. Well, in one and one third I think that like cause I dont do like a lot of cooking and sometimes its like a little bit taller than like what it should be so for like each batch of cookies The calculator problem tests students understanding of place value more so than the use of the calculator, but this student lacked confidence in her ability to solve the problem because I dont normally use a calculator so I didnt really know that much about it. Her lack of confidence was not indicative of her lack of understanding of the underlying mathematical concept as evidenced by her correct response of solving the problem by adding one hundred. She also lacked confidence in her ability to solve the school lunch money test item because I dont normally do money with my mom like for lunch so like she just puts it on a check like or something and like how could a mother have only one dollar bills? Mathematically, this problem involved multiplication and addition, and the student 260

PAGE 276

had no difficulty completing all the work correctly. According to her verbal responses, her low level of self-efficacy was based not on her ability to do the math but on her comfort level and familiarity with the context of the problem. Another female interview subjects self-efficacy was primarily tied to her experiences of working on math with her father. In explanation for circling a happy face for self-efficacy she answered, I just thought it was easy and I usually do math at home with my dad all the time. For another problem where she circled the kind-of-sad face, she explained Because when I used to, when I told you I used to do my math with my dad, this one got a little trickier and then I knew I was going to have trouble on this one. Her remarks indicate that her level of self-efficacy was somehow tied to an experience of doing a similar math problem with her father. One male interview subject expressed over-confidence before, during, and after providing explanations of faulty problem-solving processes. For example, he circled the slightly happy face for the string problem and provided this explanation for his work: I did three fourths minus one eighth which would equal two fourths because eight minus four is four and three minus one is two. And it says like he has Jim has three fourths of a yard of string and he wishes to divide it into one eighth of a yard of string so I just subtracted those two and got two fourths. He circled a happy face for self-assessment because, After I did it I was pretty sure I knew what I did. 261

PAGE 277

For the flour problem, the same male student circled the kind-of-happy face for self-efficacy because I wasnt really sure what I was doing. His explanation of his problem-solving processes proved that reply to be an understatement. Well it says she needs three batches of cookies and one and three, one third, one whole and one third, and its saying what, what number you have to add it by so I did if you have to make it into a whole it would have to be nine and seven, because if you put like eight and six it would come up to be nine sevenths, and it said she wanted to make three batches. Based on his written work and verbal explanation, it appears that he perceived this as a missing addend addition problem. Not only was his strategy incorrect, but his computational methods were also flawed. His self-assessment was another kind-of-happy face because I wasnt sure if I got it exactly right. This same student circled the really happy face for self-efficacy and self-assessment for the birdseed problem even though his strategy of subtracting the two prices was inaccurate. In explanation for his self-assessment, he responded I was positive I knew what I was doing on this and I was positive I got the answer correct. The other five interview subjects provided explanations for their measures of self-efficacy that were closely tied to their perceptions of their ability to answer the specific problems successfully. 262

PAGE 278

Question Five: Findings for Self-Assessment The findings in this section address the following research question: To what extent is students ability to assess their own performance on mathematical problem solving tasks as measured by a self-reported rating scale related to gender, reading ability as measured by teacher ratings and FCAT reading scores, and mathematics ability as measured by teacher ratings and FCAT mathematics scores? The data for this question, students self-assessment mean bias scores, were obtained through administration of the study test instrument consisting of twelve constructed-response mathematical problem-solving items. Students self-reported their self-assessment by circling one of five faces at the bottom of the work space when they were done working each problem. Mean bias is a measure of calibration which shows the accuracy of students self-assessments by finding the average difference between their self-assessment ratings and their actual performance. A positive mean bias indicates a situation where a students average rating of his/her own performance exceeds his/her actual performance average. A negative mean-bias score indicates that the students self-assessment was, on average, lower than her/his actual performance on the test. Each item contained a self-reporting likert-type scale at the bottom of the page following the work space to obtain self-assessment data. The scale, formatted using a range of five faces from very happy to very sad, asked students to rate how well they thought they solved the problem after their work was complete. After the test administration, the students responses were 263

PAGE 279

converted to numerical scores ranging from zero to four, with four indicating the highest level of self-assessment. To obtain each students mean bias score, the average of all twelve mathematical performance scores was subtracted from the average of all twelve self-assessment scores. The self-assessment mean bias scores had a potential range of -4 to +4. Descriptive Statistics Descriptive Analysis of Total Self-Assessment Scores Although the variable of interest for this research question was self-assessment mean bias, a clear picture of the data cannot be achieved without an understanding of students total self-assessment scores. The first of the following three sections details how the students total self-assessment scores varied according to the grouping variable. The second section explains how the total samples scores varied by test question. The third section looks at group data for each of the twelve questions. Overall analysis of groups. The means and standard deviations of students total self assessment scores, organized by gender, reading ability, and mathematics ability, are presented in Table 32. Based on twelve items with a maximum self-assessment score of four points on each item, a student who judged his/her work to be perfect would have a total score of 48. The data show that boys total scores were an average of more than four points higher than girls. Boys lower standard deviation also means that their scores were less dispersed than girls. When the students were grouped according to mathematics ability, the data show that students categorized as having high mathematics ability 264

PAGE 280

scored their work higher than students of lower ability. The variability of the scores follows the same trend with the lowest standard deviation being evidenced in the high ability group, and the highest deviation being found in the low ability group. When the grouping variable was changed to reading ability, the data show that better readers rated their work higher than poor readers, however, the dispersion of scores was the highest for the middle-ability readers. Table 32 Total Self-Assessment Scores by Group Group n M SD By Gender Boys 129 38.91 6.48 Girls 108 34.67 8.20 By Mathematics Ability High 57 41.40 5.57 Middle 129 36.78 7.24 Low 51 32.53 7.80 By Reading Ability High 73 40.42 6.36 Middle 134 35.83 7.77 Low 30 33.70 6.73 Note. N=237, max. score = 48 points. 265

PAGE 281

Analysis of overall sample by question. Table 33 shows the mean, standard deviation, skewness and kurtosis for students total self-assessment Table 33 Descriptive Statistics for Self-Assessment Scores by Question Question M SD Skewness Kurtosis Flowerpots 3.87 0.45 -4.41 21.87 Ms. Hernandez 3.51 0.87 -2.13 4.78 Calories 3.41 1.07 -2.00 3.17 Jeans Class 3.69 0.71 -2.84 8.78 Pizza 2.90 1.21 -0.99 0.11 String 1.70 1.52 0.25 -1.41 Balloons 3.29 0.88 -1.29 -1.46 Flour 2.94 1.27 -1.07 0.05 Calculator 2.67 1.47 -0.79 -0.79 School Lunch 3.40 0.98 -1.83 3.02 Pencils 2.64 1.60 -0.72 -1.14 Birdseed 2.96 1.33 -1.07 -0.05 Total 36.97 7.60 -0.63 -0.12 Note. N=237, max. score for individual items = 4 points, max. total score = 48 points. scores (possible range of 0-48) and their self-assessment scores for each question (possible range of 0-4). The distribution of students total self-assessment scores is roughly normal with a skewness value of -0.63 and a 266

PAGE 282

kurtosis value of -0.12. Most of the distributions for individual questions are negatively skewed and leptokurtic, indicating a preponderance of high scores, and a spiked distribution. This is especially true for the items for which students rated their work the highest. The items for which students self-assessment scores were the highest (flowerpot) and lowest (string) were identical to those for which students received the highest and lowest mathematical performance scores. The flowerpot item is also characterized by the smallest level of variability, with the pencil item showing the greatest variability of self-assessment scores. Analysis of groups by question. Descriptive statistics for students by-group scores for each question are presented in Appendix R. Boys were more confident than girls about the success of their problem-solving efforts for eleven of the twelve questions, but had the same self-assessment mean for the pizza item. In addition, the variability of girls scores was higher for ten of the twelve test items, the only exceptions being the pizza and string questions. Both boys and girls felt the most confident about their work on the flowerpot question and the least confident about their work on the string question. When grouped according to mathematics ability, students had the most varied rank order for the balloon question. With a rank order of one representing the question for which a group felt the most confident in their work, the high, middle, and low ability groups ranked the question 9 th 6 th and 3 rd respectively. When students were grouped according to reading ability, the greatest variability 267

PAGE 283

in rank order was found for the calories question with the high, middle, and low ability groups ranking it 5 th 3 rd and 7 th respectively. Descriptive Analysis of Self-Efficacy Mean Bias Overall analysis of groups. Students self-assessment mean bias scores behaved in a consistent pattern when sorted by groups (Table 34). As with the self-efficacy mean bias results, boys were more overconfident in their self-assessments than girls. Girls showed themselves to be more accurate self-assessors, but their bias scores had about the same level of dispersion about the mean as the boys. When the sample was grouped according to mathematics ability, the mean bias for the high-ability students was only slightly more than half that of the low ability students. The variability of the scores was also the highest for the low ability group. These results illustrate that students of higher mathematics ability were more accurate in their self-assessments than low-ability students. When viewed through the lens of reading ability, the data show that better readers were more accurate in their self-assessment of their mathematical problem-solving efforts than were middle or low readers. Low readers, on average, scored their work over one point higher than their actual performance scores, and the variability of low readers scores was higher than for middle or high ability readers. Whether or not these differences are large enough that inferences can be made about the population remains to be seen in the inferential analysis. 268

PAGE 284

Table 34 Self-Assessment Mean Bias Scores by Group Group n M SD By Gender Boys 129 0.97 0.56 Girls 108 0.71 0.58 By Mathematics Ability High 57 0.60 0.39 Middle 129 0.89 0.59 Low 51 1.04 0.66 By Reading Ability High 73 0.67 0.49 Middle 134 0.88 0.58 Low 30 1.16 0.67 Note. N=237. Range of mean bias scores is from -4 to +4. The box and whisker plots shown in figure 5 illustrate the differences in the distributions of the self-assessment mean bias scores when the sample was grouped according to three different grouping variables: gender, mathematics ability, and reading ability. The shape of the distribution of scores for boys and girls is very similar, with girls distribution being situated closer to zero (representing perfectly accurate mean bias) than boys. The plots showing the sample as grouped by mathematics ability illustrate a consistent pattern with the H-spread of the high ability students being closer to zero than either the middle 269

PAGE 285

or low ability groups. The whiskers for the three groups illustrate the same phenomenon observed in the self-efficacy plots, with the upper and lower quartiles for the middle and low ability group reaching farther into the underand over-confidence ranges than for high ability students. When the sample was re-grouped according to reading ability, the H-spread for the three groups followed the same pattern as was found when the students were grouped according to mathematics ability; the higher ability students were more accurate in their self-assessments. In addition the box representing the H-spread of the high ability students was narrower than the other two boxes. The most under-confident -1.5-1-0.500.511.522.53BoysGirlsH MathM MathL MathH RdgM RdgL RdgGroupSelf-Assessment Mean Bias Scores Q1 min median max Q3 Figure 5. Box and whisker plots of self-assessment mean bias scores grouped by gender, mathematics ability, and reading ability. 270

PAGE 286

student was found in the middle ability group, and the most over-confident student was situated in the lowest ability group. It is interesting to note that none of the students in the low reading-ability group were under-confident. Analysis of overall sample by question. Table 35 shows the self-assessment mean bias score average and standard deviation for the entire test Table 35 Self-Assessment Mean Bias Scores by Question Question M SD Flowerpots 0.80 0.86 Ms. Hernandez 0.83 0.93 Calories 0.58 1.15 Jeans Class 1.01 1.10 Pizza 0.59 1.36 String 0.80 1.38 Balloons 0.38 1.14 Flour 0.90 1.26 Calculator 0.88 1.40 Lunch 1.18 1.08 Pencils 0.76 1.29 Birdseed 1.50 1.47 Total 0.85 0.58 Note. N=237, mean bias scores can range from -4 to +4. 271

PAGE 287

as well as for each individual test item. The mean bias was found by subtracting the mean mathematical performance score from the mean self-assessment score. As a result, a positive bias score indicates students overconfidence in how successfully they solved an item, and a negative score indicates students under-confidence. As with the results from the self-efficacy data which measured students task-specific confident before commencing work on each problem, students were also overconfident in their self-assessment of how well they actually performed for all twelve test items. Also, as with the self-efficacy results, students were the most overconfident about their performance on the birdseed problem, with a selfassessment score a point and a half higher than the average mathematical performance score. Students were the most accurate in their self-assessment on the balloon item, followed by the calories and pizza items. Interestingly, the mean bias score for self-assessment was higher than the mean bias score for self-efficacy for eight of the items. The measures were the same for the calories and calculator items, and the self-efficacy mean bias was higher than the self-assessment mean bias for the string and flour items, both of which involved fractions. These results indicate that, on average, students felt more confident about the outcome of their problem-solving efforts than they felt about their ability to solve the problem before they began working. This is especially true for the Ms. Hernandez, pizza, and school lunch items whose average self-assessment mean bias scores were more than 0.20 points higher than their self-efficacy mean bias scores. 272

PAGE 288

Analysis of groups by question. Appendix S presents the mean and standard deviation of students self-assessment mean bias scores organized according to group. Girls were overconfident about their performance on all twelve test items. Girls were the most overconfident about their work on the birdseed item, followed by the school lunch and Ms. Hernandez questions. Boys were also overconfident for all test items, but their self-assessments were the most accurate for the balloon item. Like girls, boys highest level of overconfidence was exhibited on the birdseed and school lunch items. Inferential Statistics Factorial ANOVA Results In order to suggest that differences in gender, mathematics ability, and reading ability would result in differences in students self-assessment mean bias scores in the population, chance must be ruled out as a plausible explanation for the observed differences in the sample. To assess the tenability of a chance explanation, a 2x3x3 factorial ANOVA was conducted with an alpha level set at .05 for each effect. The degree to which the Type I error rates are actually controlled to the specified alpha level depends on how adequately the data meet the assumptions of independence, normality, and equal variances. The assumption of independence was met by ensuring that different observations came from different individuals. The study test was administered individually and each student completed her/his work alone. The descriptive statistics indicate that the assumption of normality was not violated. To assess the assumption of equal variances, the Levenes test for equal variance was 273

PAGE 289

conducted with a resulting F(13, 224)=1.35, p=.1904. Because the p-value is above the alpha level of .05, the null hypothesis that the variances of the group scores are equal cannot be rejected, and the assumption of equal variances is not violated. Based on this analysis of the assumptions, it appeared reasonable to conduct the factorial ANOVA. The cell sizes for this study were not equal. If the disparity in cell sizes was not judged to be the result of differences in the population, the regression approach (Type III) to analyzing the sums of squares would have been appropriate. However, in accordance with the population, it was expected that most of the students in the sample would be categorized similarly for mathematics and reading, resulting in few students in the cells representing a high rating for one subject and a low rating for another. Because the sample cell distribution was judged to be indicative of the distribution in the population, either the Type I (hierarchical) or Type II (experimental) method of computing sums of squares would be more accurate. The hierarchical method is only used for situations in which there is an established order for the effects. That is not the case for this question. It is clear that gender, as a demographic variable, would be placed first in the ordering because although gender may affect students mathematics and reading abilities, the reverse argument does not hold true. The issue is that the literature on the relationship between reading ability, mathematics ability, and self-assessment is so undeveloped that there is no theoretical basis for ordering the variables. For this reason, the Type II 274

PAGE 290

experimental method for computing sums of squares was utilized for the factorial ANOVA for this question. The obtained F(13,224)=4.22, p<.0001 for the overall F-test indicates that one or more of the independent variables was significantly related to the mathematics performance scores. The R-square value of 0.1843 indicates that over 18% of the variance in the self-assessment mean bias scores can be explained by one or more of the independent variables. The results of the main and interaction effects for the three-way factorial ANOVA are presented in Table 36. The findings of the factorial ANOVA for self-assessment mean bias indicate that the main effects of gender with F(1,236)=18.92, p=<.0001, and mathematics ability with F(2,236)=3.85, p=0.0226 were statistically significant. The low pTable 36 Factorial ANOVA Results for SelfAssessment Mean Bias Scores Source df MS F p 2 f Gender 1 5.51 18.92 <.0001 0.07 0.28 Reading 2 0.65 2.22 0.1111 0.02 0.18 Gender*Reading 2 0.18 0.63 0.5345 0.00 0.11 Mathematics 2 1.12 3.85 0.0226 0.01 0.14 Gender*Math 2 0.41 1.42 0.2428 0.00 0.07 Reading*Math 3 0.21 0.63 0.4877 -0.01 0.09 Gender*Reading*Math 1 0.01 0.03 0.8644 0.00 0.01 Note. N=237, 2 = omega squared, f=Cohens f measure of effect size. *p<.05. 275

PAGE 291

values of <.0001 and .0226 for gender and mathematics ability suggest that if the null hypotheses were true, the probability of obtaining an F as large or larger than the one obtained would be <.0001 for gender and .0226 for mathematics ability. Because these probabilities are so small (less than .05), the null hypotheses are rejected in favor of alternative hypotheses. Because the variable of gender has only two levels, it can be concluded that the population group means differ. Mathematics ability is a three-level variable, therefore further analysis is required to determine which pair or pairs of group means differ by a significant amount. The omega squared values of 0.07 and 0.02 for gender and mathematics ability, respectively, indicate that an estimated combined total of 9% of the variation in the population self-assessment mean bias is associated with these two variables. Gender exhibits a medium effect size with a Cohens f value of 0.28, and the Cohens f value for mathematics ability is small to medium at 0.18. Because the main effect for mathematics ability was significant, but the interaction effect was not, the Tukey follow-up test is an appropriate means of identifying the significance of differences in group means. The group means for self-assessment mean bias for the high, middle, and low mathematics ability groups were 0.60, 0.89, and 1.04, respectively. The difference between the mean of the high and the low groups was significant at 0.44 points, as was the difference in the means between the high and the middle mathematics ability groups at 0.29. The difference between the means of the middle and low groups was too low to be significant. These results show that students who were in the high mathematics ability group were significantly more accurate assessors of 276

PAGE 292

their own work than students in the middle and low ability groups. Accordingly, the students of lower mathematics ability were more likely to be overconfident in their mathematical performance. The descriptive data for the main effect of reading ability indicated a large difference in group means for the sample. Although the factorial ANOVA did not find reading ability to be significant at an alpha of .05, its effect size was larger than that of mathematics ability. A larger sample size for this study would have increased the power of the F test to detect the significance of this effect, which may or may not have resulted in it being classified as significant. Multiple Regression Analysis Introduction. The three predictors for this analysis were gender, FCAT reading scores, and FCAT mathematics scores. The FCAT scores ranged from 100-500 points, and were abbreviated as FCATR and FCATM for reading and mathematics scores, respectively. The sample consisted of 237 students, 108 girls and 129 boys. The purpose of the analysis was to determine what level of the predictors would provide the most accurate prediction of students self-assessment mean bias scores. This outcome variable was abbreviated as SAMBIAS, and had a potential range of -4 to +4. A score of -4 indicates a student who scored their own work an average of four points lower (on a 4-point scale) than her/his actual performance. A score of +4 could only be produced if a student scored his/her work for every problem a perfect 4, but had a actual score of zero for every problem. The most accurate self-assessors had self-assessment mean bias scores close to zero, with a positive score indicating 277

PAGE 293

overconfidence, and a negative score signifying under-confidence. Because the descriptive data for these three predictors was analyzed in the multiple regression analysis for question two, that information will not be repeated here. The multiple regression test was initially run with the three main effect predictors and two additional interaction predictors, gender*FCATM and gender*FCATR. Because the interactions were not significant, and their inclusion obscured the relationships among the remaining variables, they were excluded from the analysis and discussion that follows. A table summarizing the multiple regression results with the interactions is provided in Appendix T. Multiple regression results. A summary of the multiple regression results is presented in Table 37. The variables of gender and FCAT reading scores were found to contribute significantly to the predictive utility of the regression equation. Students FCAT mathematics scores did not provide a significant contribution. If there was any significant cost or time expenditure required to obtain this data, it should be eliminated from the model. However, collecting FCATM in addition to FCATR would not require much additional effort, and because FCATM does have a slight impact on the equations predictive utility, it will remain in the model. The R 2 value for the model was 0.2017, indicating that about 20% of the variability in students self-assessment mean bias scores is accounted for by the combination of the three predictor variables. Because R 2 tends to be an overly optimistic estimation of the regression equations ability to predict the outcome variable in the population, the adjusted R 2 was observed. The adjusted value, at 278

PAGE 294

0.1915, represents shrinkage of one percentage point. Cohens effect size f 2 = R 2 /(1R 2 ) was calculated to be 0.2527, indicating a medium to large effect size. The standard error of estimate, as a measure of the typical prediction error, was 0.52325 points. Table 37 Summary of Multiple Regression Analysis for Self-Assessment Mean Bias Variable b SE t p Intercept 2.9636 0.0 0.3034 9.77* <.0001 Gender -0.2720 -0.2333 0.0713 -3.81* 0.0002 FCAT Mathematics Score -0.0019 -0.1398 0.0011 -1.72 0.0863 FCAT Reading Score -0.0040 -0.2868 0.0011 -3.60* 0.0004 Note. N=237. b=regression coefficient, =standardized regression coefficient, SE=standard error, * indicates significance at =.05 level. The obtained prediction equation was: SAMBIAS = 2.9636 + -.2720*Gender + -.0019*FCATM + -.0040*FCATR To aid in interpreting the equation, it is helpful to examine the impact on SAMBIAS of changing one of the predictor variables while holding the other two constant. To understand the impact of gender on the outcome, compare the SAMBIAS scores of two students, one boy and one girl, whose FCATM and 279

PAGE 295

FCATR scores were identical. The girls predicted SAMBIAS would be -.2720 points lower than the boys. Because the intercept is a large positive number, and because SAMBIAS scores closer to zero indicate a greater level of self-assessment accuracy, this shows that, other things being equal, girls will have predicted SAMBIAS scores that indicate a more accurate level of self-assessment than boys. If gender and FCATR are held constant, a 100-point increase in FCATM will result in a predicted 0.19 decrease in SAMBIAS, again indicating a more accurate level of self-assessment. If gender and FCATM are held constant, a 100-point increase in FCATR will result in a 0.40 point predicted decrease in SAMBIAS. The standardized regression coefficients are expressed in terms of standard deviation units and therefore allow a direct comparison of each predictors level of impact on the predicted outcome. The variable with the highest standardized coefficient was FCATR, indicating that a 1 standard deviation increase in FCATR will result in a predicted 0.2868 point decrease in SAMBIAS. The next variable of greatest impact was gender, with a standardized regression coefficient of -0.2333. The variable of least impact was FCATM, with a regression coefficient of -0.1398. This means that a 1 standard deviation increase in FCATM would result in a 0.1398 standard deviation decrease in predicted SAMBIAS. The squared semi-partial correlations were examined to determine each predictor variables unique contribution to the models ability to account for the variability in the outcome variable. The correlations were .04984 for gender, .01017 for FCATM, and .04433 for FCATR. These figures can be 280

PAGE 296

interpreted to mean that the variables uniquely accounted for 5.0%, 1.0%, and 4.4% of the variability in SAMBIAS, respectively. Assumptions. Regression analyses are based on many assumptions that should not be violated. The data were screened and an evaluation of the tenability of each of the assumptions was conducted. To assess the assumption of homoscedasticity of errors, the residuals were plotted with the predicted variables. No patterns appeared in the plot, indicating that this assumption was not violated. The residuals were also found to represent a normal distribution with skewness and kurtosis values of -0.06 and 0.18, respectively. A further analysis of the residual plot shows a linear relationship between the values, and based on the design of the study, there is no reason to believe that the residuals are not independent. Because the outliers were removed prior to the regression analysis through observation of the descriptive data, the evaluation of the students Cooks D values did not identify any additional outliers. Another assumption is that the predictors were measured without error. The variable of gender was self-reported, and no incidence of error has been detected. Due to the stringent testing conditions for the FCAT test, the measurement error associated with these two predictor variables is not deemed to be in violation of the assumption. The predictors of FCATM and FCATR cannot be considered fixed because the values may not be the same if the study is replicated. However, this is the case in many studies and multiple regression is considered robust to violations of this assumption. In conclusion, based on the analysis of 281

PAGE 297

the assumptions, multiple regression was an appropriate procedure for analyzing the data for this question. Conclusion. Students gender and FCAT reading scores were found to be significant predictors of their self-assessment mean bias scores. Their inclusion in the regression equation significantly increased the equations ability to predict the outcome variable. Although the variable of FCAT mathematics scores was not a significant contributor, it did produce a slight increase in the overall R 2 value. Because there was not a significant cost or time factor in gathering this data, there is no harm and a small benefit in leaving it in the model. The regression analysis confirms the finding of the factorial ANOVA for gender. However, the factorial ANOVA found mathematics ability level to be significantly related to SAMBIAS, and reading ability level not to be significantly related. The differences in the outcomes may be due to the different ways that ability levels were defined for the two analyses. For the factorial ANOVA, students FCAT level scores were combined with their teachers ratings of their ability to arrive at categorical ability level of high, middle, or low. For the regression analysis, the continuous FCAT scale scores were used as a measure of ability level. The categorization of ability into three discrete categories likely resulted in some students whose FCAT scale scores were nearly identical but fell on two different sides of the dividing line to be categorized in different ability levels. 282

PAGE 298

Teachers Assessment of Their Students Performance The data gathered from the teachers regarding their students preparedness to be successful on the test items was used to calculate a mean bias score. The actual mean of students mathematical performance score for each question (0-4 scale) was subtracted from the mean of teachers assessment of student preparedness (0-4 scale). The results indicate that the teachers, as well as the students, were overconfident, but not to the same degree. The students average mean bias score was 0.85, compared to the teachers average mean bias score of 0.40. Just as partitioning the student sample by gender revealed that boys were more overconfident than girls in their work, a partitioning of the teachers by gender showed that male teachers overconfidence in their students work (mean bias score of 0.57) exceeded that of female teachers overconfidence (mean bias score of 0.32). 283

PAGE 299

CHAPTER 5 SUMMARY, CONCLUSIONS, IMPLICATIONS AND RECOMMENDATIONS The purpose of this study was to examine how gender, reading ability, and mathematics ability differentially moderated students mathematical problem-solving processes, linguistic explanations of those solution processes, achievement on a mathematical problem-solving test, self-efficacy on a self-reported rating scale, and self-assessment on a self-reported rating scale. This chapter contains four sections. The first section presents a summary of the study. The second section describes the conclusions and implications derived from the research findings, and is organized according to research question. The third section discusses the recommendations for practice based on the study conclusions and implications. The fourth and final section offers recommendations for future research. Summary of the Study Mathematical problem solving is a complex process that involves much more than simple calculation. For a child to complete a problem-solving task successfully, he/she must read and understand the problem situation, evaluate what the problem is asking, make a plan for what mathematical procedure(s) 284

PAGE 300

must be used to solve the problem, complete the plan, assess the reasonableness of the answer, then communicate the results. Previous research shows that the entire process is impacted by how well students can understand the problem they read, whether their mathematical knowledge provides them with the necessary tools to solve the problem, their confidence in their own ability to solve the problem, their ability to evaluate their own work, and their ability to communicate to the assessor what they have done. This study utilized a twelve-item constructed-response problem solving test to gather data about students problem-solving processes, their written linguistic communication of those processes, their self-efficacy when commencing work on a problem, and their accuracy in assessing their own work. The intent of the study was to answer the following research questions: 1) To what extent are students choices of mathematical problem-solving processes related to gender? 2) To what extent is childrens performance in mathematical problem solving related to gender, reading ability as measured by teacher ratings and Florida Comprehensive Assessment Test (FCAT) reading scores, and mathematics ability as measured by teacher ratings and FCAT mathematics scores? 3) To what extent is the quality of students linguistic explanations of their problem-solving processes, as measured by a holistic rubric, related to gender, reading ability as measured by teacher ratings and FCAT 285

PAGE 301

reading scores, and mathematics ability as measured by teacher ratings and FCAT mathematics scores? 4) To what extent is the relationship between students feelings of self-efficacy when commencing work on a mathematical problem, as measured by a self-reported rating scale, and their actual performance on a mathematical problem solving test as measured by a holistic scoring rubric related to gender, reading ability as measured by teacher ratings and FCAT reading scores, and mathematics ability as measured by teacher ratings and FCAT mathematics scores? 5) To what extent is students ability to assess their own performance on mathematical problem solving tasks as measured by a self-reported rating scale related to gender, reading ability as measured by teacher ratings and FCAT reading scores, and mathematics ability as measured by teacher ratings and FCAT mathematics scores? The study test questions were gleaned from released items of the FCAT and NAEP exams. Because problem-solving tasks span such a wide range of mathematical content, the scope of the test instrument was narrowed by limiting the items to those that assessed the Number and Operations standard. Problems that were not initially in constructed-response format were edited, and each question was followed by a request for the students to show their work then explain how they knew they were correct. This request provided an inducement for students to write down their computational work and provide a linguistic explanation of their problem-solving processes. Each problem statement was 286

PAGE 302

followed by a five-face smiley face likert scale that measured students level of self-efficacy before their work on the item commenced. Each problem was presented on a separate page, leaving a large work space for children to show their work and explain their thinking. Following the work space was a second smiley-face scale that asked students to rate how well they thought they did on the problem. The processes students chose to solve the problem were analyzed primarily through qualitative means. The first step was to sort the student responses for each test item according to the process used. A total of 232 processes were identified for the whole test, with individual questions having from twelve to twenty-eight unique processes. A review of all processes led to a broad categorization scheme of successful, partially successful, and unsuccessful responses. For each of these categories, five to six themes emerged to serve as an organizing framework for the 232 processes. Table 13 (page 191) lists all seventeen of the themes, and notes which of the three broad categories each one falls under. Descriptive statistics were used to paint a broad picture of the themes of the processes used by boys and girls. Chi-square statistical tests were conducted with the aid of Statistical Analysis Software (SAS) to determine whether any gender differences existed in the themes of the processes students used to solve the problems. An alpha level of .05 was used to test for significance for all of the inferential statistical tests. Students mathematical performance was scored using the FCAT four-point holistic rubric, resulting in a possible score range of 0-48 points for the test. 287

PAGE 303

Item-specific rubrics were also created for all test questions and are presented in Appendix K. The quality of their linguistic explanations was rated according to a researcher-created three-point holistic rubric, resulting in a 0-36 point score range. Item-specific rubrics were created for the test questions to aid in the process of scoring the linguistic explanations and are included in Appendix L. Student responses on the self-efficacy and self-assessment scales were converted to a numerical score ranging from 0-4, resulting in a 0-48 point possible score range. Self-efficacy mean bias, as a measure of the accuracy of students feelings of self-efficacy, was calculated as the average difference between students performance scores and their self-efficacy scores. A positive mean bias indicated over-confidence, and a negative mean bias signified under-confidence. Self-assessment mean bias scores were calculated in the same manner. All four of the variables of performance, linguistic explanations, self-efficacy mean bias, and self-assessment mean bias were analyzed through descriptive statistics, factorial ANOVAs, and multiple regression analyses. The study sample was limited to the fifth-grade students in three schools in a west-central Florida school district. The schools had relatively homogeneous demographic profiles, and were in a part of the county populated by upper-middle class families. The eligibility for participation was extended to all 396 students in the sixteen sample classrooms, but only those students whose parents provided signed permission were allowed to participate. The test was administered in the schools to 286 students, and a total of 284 students completed the study test. Information about the participants reading ability, mathematics ability, and 288

PAGE 304

gender was obtained through teacher ratings, previous FCAT scores, and for gender, students self-report. Reading and mathematics ability were treated as categorical variables for the factorial ANOVAs, and were determined through a combination of students FCAT level scores and teacher ratings (Table 7). For the multiple regression analyses, reading ability and mathematics ability were continuous variables represented by students FCAT scale scores. FCAT level scores were available for a total of 242 students, and FCAT scale scores were obtained for 239 students. An analysis of the descriptive data led to the exclusion of three outliers from analysis, one of whom would have already been eliminated due to missing test scores; this resulted in a total sample of 237 students, 129 boys and 108 girls, for the factorial ANOVA and multiple regression analyses. Previous research has been conducted on various pieces of the problem-solving process, but there are holes in the literature for every element of the process that this study attempted to address. Most of the research related to problem-solving processes has been at the early elementary level, with a limited amount of research at the high school level. The research conducted with young children (Carr et al., 1999; Carr and Davis, 2001; Fennema et al., 1998) found significant gender differences in their strategies, but the problems were limited to single-step, non-contextualized addition and subtraction problems. The research conducted with high-school students (Gallagher & DeLisi, 1994; Gallagher et al., 2000) utilized complex problems, but the sample was limited to high ability students. Prior to this research, there was virtually no information as to how 289

PAGE 305

upper-elementary aged boys and girls differed in their problem-solving processes. There is extensive research that verifies the connection between childrens reading ability and mathematical performance, but a majority of that research focuses on children with reading disabilities and how those disabilities can best be accommodated in testing situations. Because problem solving is becoming an increasingly more important focus of mathematics, and also because problem solving is dependent on textual communication, further research into the reading-problem solving connection was still needed. The body of literature on gender differences in mathematical performance is extensive. Gender differences in performance in this country have declined over time, and are now rarely found below the middle grades. However, the relationship between reading ability and boys and girls mathematical problem-solving performance is uncharted territory and was of interest in this study. Previous research demonstrates that children benefit from the process of writing about their mathematical endeavors (Johnson et al., 1998; Pugalee, 2001; Rudnitsky at al., 1995), and girls have consistently outperformed boys in measures of writing performance (Gabrielson at al., 1995; Gambell & Hunter; 1999; Knudson, 1995; Malecki & Jewell, 2003; Pajares et al., 1999; Peterson, 1998; Peterson, 2000). However, although the research shows the benefits of writing about mathematics, none of the aforementioned studies attempted to link the quality of that writing to students ability. For the research devoted to gender differences in writing quality, the studies did not focus on writing in the context of 290

PAGE 306

mathematical problem solving. Also, although reading ability is known to be intrinsically linked to writing, students reading ability was not included as a factor in any of these analyses. This research sought to add to the body of knowledge about the relationship between gender, reading ability, mathematics ability, and the quality of boys and girls linguistic explanations of their problem-solving processes. Self-regulated learning is linked to strong performance in mathematics, and self-efficacy, as an element of self-regulation, mediates the relationship between students attitudes about mathematics and their mathematical achievement across age ranges. A preponderance of the self-efficacy research in the body of literature was conducted with high school and undergraduate college students. Only two studies dealt with the same population that this study endeavored to understand (Schunk & Gunn, 1986; Zimmerman & Martinez-Pons, 1990). Both of these studies are relatively old, and both only examined total self-efficacy, not the accuracy of that self-efficacy. In addition, neither study included a measure of student ability as an independent variable. To date, there is no research that clarifies the relationship between students reading ability and the accuracy of their feelings of self-efficacy. This research used reading ability as an independent variable in the study of childrens mathematical problem-solving performance, the quality of their written linguistic explanations of their problem-solving processes, and their self-efficacy and self-assessment. Previous research on self-assessment shows that, regardless of the accuracy of students self-assessment, the simple act of assessing their own 291

PAGE 307

work improved students performance. Self-assessment is a step in the problem-solving process that, when omitted, can lead to a student submitting a final answer that he/she would have known to be incorrect had he/she performed even a cursory review to check for reasonableness. For these reasons, self-assessment should be, but has not been, an active area of research. Only one study has examined self-assessment at the elementary level, and the only independent variable was gender (van Kraayenoord & Paris, 1997). The remainder of the scant research has been conducted with high school and college students. This research endeavored to add to the existing body of knowledge on gender differences in the accuracy of childrens self-assessments, and to generate new understandings about the relationships between reading and mathematics ability and childrens self-assessment of their problem-solving efforts. In conclusion, this research sought to use the data collected through the study test of mathematical problem solving to add to the body of knowledge on various elements of childrens problem-solving behaviors. The analysis of students problem-solving processes used gender as the only independent variable. The remaining analyses of performance, linguistic explanations, self-efficacy, and self-assessment, examined the data through the lens of childrens gender, their reading ability, and their mathematics ability. 292

PAGE 308

Conclusions and Implications Research Question One: Problem-Solving Processes Problem-Solving Process Conclusions The data for this research question were generated through a qualitative analysis of students problem-solving processes. The processes for each of the test questions were categorized into seventeen emergent themes and are presented in Appendix M. The data were analyzed descriptively (Table 14), and Chi-Square tests were used to determine whether there were any significant gender differences in the processes that students chose to solve each of the questions. The Chi-Square results are presented in Table 15 (page 197). The descriptive data in Table 14 (page 195) suggests that there were gender differences in the themes of the processes students used to solve the problem in the study sample. To test for the significance of these differences and maintain the overall alpha level at .05 across the thirteen chi-square tests, the Bonferroni adjustment was used to adjust the per-item alpha level downward to 0.004. Table 15 (page 197) shows that none of the differences were significant at the .004 alpha level. As a result, even though the data for the sample shows gender differences, those differences were not large enough to conclude that they were representative of differences in the population. The pencil problem processes, as presented in Table 16 (page 198), represented the largest effect size of all the problems. For this item, more girls than boys were coded with themes categorized as unsuccessful. Many more boys than girls used a novel approach to solve the problem, and more girls than boys solved the problem 293

PAGE 309

using a drawing or diagram. The results for this question mirrored the overall finding that the girls in the sample were more prone to solving the problems through the use of drawings or diagrams. Because none of the inferential results were significant, no conclusions can be drawn about the population as a whole. In the earlier work of Carr et al. (1999), Carr and Davis (2001), and Fennema et at. (1998) with first through third grade students, significant differences were found with much smaller samples. There are many potential reasons why the results of this study did not confirm the work of previous researchers. To begin, the aforementioned studies used non-contextualized single-step addition and subtraction computation problems with a dichotomous coding scheme. The complex nature of the problem-solving items on this test led to a much more detailed coding scheme that, given the large number of themes, was not able to detect differences that may have existed. A larger sample size, and the increased power associated with a larger sample, may have led to different results. There is also a possibility that no differences reached significance because no real differences exist in the population. Regardless of the lack of significant findings regarding gender, the descriptive statistics still provide valuable information about childrens problem-solving processes. Less that half of all student responses fell under the theme of successful solutions using a traditional method. By collapsing the data for the remaining themes across categories, such as combining the data for theme #2 Novel method successful with that of theme #10 Novel method partially 294

PAGE 310

successful, a clearer picture is created. In all, 6.1% of the students used a novel method, 4.9% of the responses were solved with a drawing or diagram, 6.7% of the responses were not successful due to computational problems, 4.3% had problems with mathematical logic, and fully 12.4% were unsuccessful due to a misunderstanding of the question or inattention to the question. Problem-Solving Process Implications The descriptive statistics for the collapsed themes provide information vital for teacher educators and those who teach mathematics. The data show that students often chose a novel solution method, and those who sought a novel solution process were successful most of the time. Students also frequently solved, or attempted to solve, problems through the use of a drawing or diagram. In other words, many students were creative in their computational approach to problem solving, whether or not they arrived at a correct solution. The mathematics education system of this country is sometimes characterized as being a mile wide and an inch deep. Teachers are too often satisfied with finding and discussing one productive means of solving a problem. These results show that children inherently choose a variety of methods for solving problems, and should be supported in pursuing and becoming successful in those methods. In some countries where children consistently score well on international tests, a distinguishing characteristic of their classroom instruction is the depth of their coverage of material (Fierros, 1999). Instead of doing ten problems one way in a session, they may do one problem ten different ways. This instructional style acknowledges childrens multiple ways of thinking and 295

PAGE 311

encourages creative approaches to problem solving. The results of this study indicate that an approach of less breadth and more depth may help ensure that children are better equipped to be successful in using a variety of problem-solving methods. Of even greater consequence is the result that one of every eight responses (381 total) was not successful due to students lack of understanding of the question, or inattention to the question. This situation may be the result of reading comprehension difficulties or a lack of self-discipline in reading the entire problem carefully. Many more responses (553 total) were coded with unsuccessful themes for flawed mathematical logic or incoherence (Cannot be determined). The data for this study does not allow any conclusions to be drawn about how many of these responses were directly related to reading difficulty, but reading difficulty cannot be ruled out as a precursor to the failure. In any case, these numbers are too large to be ignored. Regardless of how teachers of mathematics choose to view their responsibility vis a vis reading, they must acknowledge and be willing to act on the fact that students who are poor or careless readers cannot be successful problem-solvers. Research Question Two: Mathematical Performance Mathematical Performance Conclusions The data for this research question were generated through the holistic assessment of student work on the twelve-item problem solving test. All responses were scored using the FCAT four-point rubric along with item-specific rubrics that were created using data from the pilot administration of the test. The 296

PAGE 312

data were analyzed descriptively, and through factorial ANOVA and multiple regression tests. The descriptive results are presented in Tables 17 (page 201) and 18 (page 204), the factorial ANOVA results are in Table 20 (page 213), and the multiple regression results are in Table 21 (page 218). The independent variables for the inferential statistical tests were gender, reading ability, and mathematics ability. For the factorial ANOVA, reading and mathematics ability were categorical variables whose values were determined using a matrix (Table 7, page 142) based on teacher ratings and students previous years FCAT level scores. For the multiple regression, reading and mathematics ability were measured solely by the continuous variable of students previous years FCAT scale scores. The gender-related results of this study serve to confirm earlier research findings that show no significant differences in the mathematics performance of boys and girls at the elementary level. The factorial ANOVA did not find gender to be significantly related to performance, and the predictor of gender did not contribute significantly to the predictive utility of the regression equation. An interesting result that came out of the factorial ANOVA was the significance of the interaction effect between gender and reading ability for mathematical performance (see Figure 2). Gender was not a significant variable, yet the interaction of gender with reading ability reached significance. Girls appear to have profited more from high reading ability than boys, and suffered more from having only medium reading ability than boys. Among low ability readers, boys and girls scores were nearly identical. 297

PAGE 313

As expected, all statistical tests showed a strong relationship between mathematics ability and performance on the study test. These results do not add to the body of knowledge and therefore will not be discussed here. For the factorial ANOVA and multiple regression analyses, reading ability was shown to be significantly related to mathematical performance. Taking all the descriptive and inferential results together leads to the conclusion that although mathematics ability had by far the strongest relationship to mathematical performance among the variables, reading ability also plays a significant role in students mathematical problem-solving performance. Mathematical Performance Implications Both the descriptive and inferential results indicate a strong relationship between mathematics ability and mathematics performance. The implication that better mathematics students score better on a mathematical test is not noteworthy. Due to the lack of significant findings about the relationship between gender and mathematical performance, no implications can be made from the results for gender. The relationship between reading ability and mathematical performance bears further discussion. Together with the findings for question one, the results indicate that mathematical competence alone does not determine students success in problem-solving endeavors. Mathematical story problems are communicated through print. Before a student can begin the mathematical part of the problem-solving process, he or she must be able to read and understand the problem situation and what the problem is asking. The text plays a mediating 298

PAGE 314

role between the student and the mathematics, and can only be activated by competent reading comprehension. This reality does not receive enough attention in the elementary classroom or in the mathematics methods courses in preservice teacher programs. The factorial ANOVA result indicating a significant interaction between gender and reading ability at the middle-ability level is also important. Gender alone was not found to be significantly related to performance, but as gender interacted with reading ability, especially among middle-ability readers, significant differences in performance appeared. Unlike high-ability readers, boys of middle reading ability had a distinct advantage over girls in mathematical problem solving. The strength of the results allows for the conclusion that this interaction exists in the population. In an age where equal treatment of all children, regardless of demographic characteristics, is so important, this result deserves consideration in the classroom. Teachers must be aware that their female students of middle reading ability may struggle more with comprehension of mathematical story problems than male students of similar ability. Research Question Three: Linguistic Explanations Linguistic Explanation Conclusions The data for this research question were gathered through analysis of students explanations of their problem-solving processes on the twelve-item problem-solving test. Each student response was scored using the researcher-created holistic three-point rubric supported by item-specific rubrics. The item-specific rubrics used student responses from the pilot administration of the test to 299

PAGE 315

aid in identification of the qualities of answers for each question that would earn each score. The data were first analyzed descriptively (Table 22, Table 23, Figure 3, pages 222, 225, and 226, respectively), and then inferentially via a factorial ANOVA (Table 24, page 230) and multiple regression (Table 25, page 233). The dependent variable for this question was students total linguistic explanation score (ranging from 0-36), and the independent variables were gender, reading ability, and mathematics ability. Although the quality and completeness of girls linguistic explanations were judged superior to boys for three-fourths of the questions, the differences were not large enough for the factorial ANOVA to conclude that the differences exist in the population. Gender was the variable that showed the lowest correlation with the outcome variable for the multiple regression. Including it in the model did improve the predictive power of the regression equation, but not by a significant amount. The results from the factorial ANOVA and the multiple regression show that reading ability and mathematics ability were significantly related to students linguistic explanations. In both cases, the relationship was strong enough to conclude that it holds true for the population as well. The factorial ANOVA p-value for reading ability was lower than the p-value for mathematics ability. Also, the squared semi-partial regression coefficients showed that reading ability accounted for more than twice the variability in the outcome as mathematics ability. In essence, of the three independent variables, the variable with the 300

PAGE 316

strongest relationship to the quality of students linguistic explanations was reading ability. Linguistic Explanation Implications The finding that gender was not significantly related to the linguistic explanation scores runs counter to previous research findings that girls are consistently better writers than boys (Gabrielson, et al., 1995; Gambell & Hunter, 1999; Gormely, 1993; Knudson, 1995; Malecki & Jewell, 2003; Pajares et al., 1999; Peterson, 1998; Peterson, 2000). Every single study about gender differences in writing ability clearly found girls writing skills to surpass that of boys. The difference between this research and previous studies is that this study was the only one to measure writing in the context of mathematical problem solving. There were no significant gender differences in performance, yet given girls proven superiority over boys in written communication, it is surprising that the mathematical context of the writing would strip away the gender differences in the outcomes for linguistic explanations. The descriptive statistics showed that, for both genders, the scores were much higher for mathematics performance than for linguistic explanations. These findings taken together would suggest that girls and boys would both benefit from instruction and practice in writing that is specifically targeted to mathematics. The notion that reading ability and writing ability are two sides of the same coin has gained widespread acceptance in recent years. However, most of the discussion about the relationship between these two literacy skills has not moved beyond the field of language arts. What is interesting about the results of this 301

PAGE 317

study stems from the fact that the scores for students linguistic explanations were based on students writing about their mathematical problem solving. A student who was unable to work the problem mathematically would not have been able to explain his/her problem-solving processes. As a result, the childs mathematical difficulty would have resulted in a failure to achieve a successful score for linguistic explanation. As such, the finding that mathematics ability was significantly related to students linguistic explanation score makes logical sense. However, the results for reading ability show that not only was reading ability significantly related to students linguistic explanation scores, its relationship was even stronger than that of mathematics ability. These findings indicate that to improve students ability to explain their mathematical problem-solving processes successfully is more a matter of enhancing their mathematical reading ability than their mathematics skills. Research Question Four: Self-Efficacy Self-Efficacy Conclusions The outcome variable for this research question was students self-efficacy mean bias scores as a measure of calibration; the accuracy of their feelings of self-efficacy. Students self-reported their self-efficacy for each of the studys twelve test items by circling one of five faces on a likert-type smiley-face scale. Their selection was then converted to a numerical score ranging from four for the highest level of self-efficacy to zero for the lowest level. The average of their mathematical performance scores was deducted from their average self-efficacy score to arrive at a mean bias score. A positive score indicated over302

PAGE 318

confidence, and a negative score showed under-confidence. The independent variables for this analysis were gender, reading ability, and mathematics ability. The descriptive statistics for students self-efficacy scores show that boys were more self-efficacious than girls for all twelve of the test items. The self-efficacy mean bias scores were similarly patterned with boys bias score of 0.88 dwarfing girls score of 0.58. Both genders were over-confident, but boys over-confidence eclipsed that of girls. The factorial ANOVA results indicate that the gender differences in mean bias for the sample were large enough to conclude that the differences truly exist in the population. The multiple regression findings confirm this result, with gender being the most significant predictor of self-efficacy mean bias among the study variables. For the independent variable of reading ability, the descriptive statistics showed that students of higher reading ability were generally more accurate in their self-assessments than lower-ability readers. However, the factorial ANOVA results were not significant, meaning that chance cannot be ruled out as the explanation for the differences. In contrast, the multiple regression found that reading ability was able to predict more of the variability in students self-efficacy mean bias scores than either gender or mathematics ability. These differing results may have been the result of the difference in how the variable of reading ability was defined for each of the two statistical tests. Because of the strength of the relationship between reading ability and self-efficacy mean bias found in the multiple regression, the study conclusion is that reading ability is a significant 303

PAGE 319

predictor of self-efficacy mean bias, with better readers exhibiting more accurate feelings of self-efficacy than lower-ability readers. The factorial ANOVA and multiple regression results for mathematics ability were significant. This finding leads to the conclusion that the sample students of higher mathematics ability were more accurate in their feelings of self-efficacy than students of lower ability, and that these relationships exist in the population as well. There was also a moderately strong negative correlation between self-efficacy mean bias and total performance, indicating that students with lower self-efficacy mean bias scores achieved higher mathematics performance scores. Self-Efficacy Implications No previous research has studied the accuracy of students self-efficacy in the elementary grades. The only two studies dealing with this age group observed total self-efficacy as it related to gender (Zimmerman & Martinez-Pons, 1990), and the relationship between self-efficacy and performance (Pajares & Miller, 1995). Zimmerman and Martinez-Pons (1990) found there to be no significant difference between the self-efficacy of boys and girls, contradicting the results of this study. However, three studies with middleand high-school students that included gender as a variable (Laveault et al., 1999; Pietsch et al., 2003; Randhawa et al., 1993) found males to have higher levels of self-efficacy than females. All of the studies that observed the relationship between self-efficacy and performance found a significant positive relationship (Cooper & Robinson, 1991; Hackett & Betz, 1989; Lent, Brown, & Gore, 1997; Multon et al., 304

PAGE 320

1991; Pajares & Miller, 1994; Pietsch et al., 2003; Randhawa et al., 1993; Schunk & Gunn, 1986; Williams, 1994). A students level of self-efficacy is positively related to her or his deployment of the self-regulatory strategies and strength of persistence necessary for successful problem solving (Multon et al., 1991. By combining the results of this study with findings of previous research, it can be concluded that boys higher self-efficacy gives them a performance advantage over girls. Because boys and girls performance did not differ significantly in this study, and because boys were consistently more self-efficacious, it could be said that if there had been no gender differences in self-efficacy, boys may not have scored as high as girls. The interpretation of the results for self-efficacy mean bias is not as clear. For this study, both genders were over-confident, but boys were more overconfident than girls. There has been no research in the field of mathematics that addresses the relationship between differing levels of over-confidence and performance. Therefore, although the results of this study add to what is known about how accurately boys and girls can judge their own self-efficacy, no implications can be drawn from these findings. No previous research has attempted to link reading ability to self-efficacy mean bias. The finding that reading ability is a significant predictor of students mathematical problem-solving self-efficacy mean bias is an important one. It implies that the students who were better able to read and comprehend the test items were also better able to judge how well they could perform on the 305

PAGE 321

problems. When these results are added to the correlation results, it can be concluded that because students of higher reading ability are more accurate in their self-assessment, and because students with more accurate self-assessments scored higher on the subject test, that improved reading ability leads to improved performance. The viability of this conclusion is verified by the mathematical performance results discussed earlier. The factorial ANOVA and multiple regression results for mathematics ability were significant, confirming that higher-ability students were more accurate in their feelings of self-efficacy, and that this relationship exists in the population. This finding implies that when students are better equipped to handle the mathematics involved in a story problem, they are better able to judge how well they will perform on that problem. Research Question Five: Self-Assessment Self-Assessment Conclusions The data for this question were gathered in much the same way as the self-efficacy data. Whereas self-efficacy was measured after students had read the problem but before they began working, self-assessment was measured at the completion of the work for each problem. For self-assessment, students were asked to indicate, via the circling of one of five faces, how well they thought they had done on each problem. The student responses were converted to numerical scores with four representing the highest level of self-assessment, and zero representing the lowest level. The data were analyzed descriptively and inferentially through factorial ANOVA and multiple regression analyses. The 306

PAGE 322

independent variables for the question were gender, reading ability, and mathematics ability, and the dependent variable was self-assessment mean bias. The mean bias score was found by subtracting students average mathematics performance score from their average self-assessment score. A positive mean bias indicated over-confidence, and a negative mean bias score showed under-confidence. The descriptive results show that boys rated their work higher than girls, but that they were less accurate in their ratings. For the sample, both reading ability and mathematics ability were negatively related to self-assessment mean bias. Students with higher ability had lower mean bias scores, indicating more accurate self-assessment. The factorial ANOVA results show that the relationship between gender and self-assessment mean bias was strong enough to conclude that the differences exist in the population. Mathematics ability was found to be significantly related to self-assessment mean bias, with higher-ability students being more accurate in their self-assessments. The relationship between reading ability and self-assessment mean bias was not strong enough to infer that the relationship holds true in the population. As with self-efficacy mean bias, the results for the multiple regression for self-assessment mean bias appear to contradict the findings of the factorial ANOVA. As discussed earlier, this seeming contradiction may be a reflection of the different ways that ability was measured for the two inferential tests. Reading ability was the variable found to be the strongest single predictor of self-assessment mean bias. The variable of gender was also found to increase the 307

PAGE 323

ability of the regression equation to predict self-assessment mean bias. Mathematics ability made a small, but not significant, contribution to the equation. Self-Assessment Implications Just as with self-efficacy, both boys and girls were over-confident in the assessment of their work. The findings about the relationship between gender and self-assessment also mirrored that of self-efficacy with boys being more over-confident than girls. Both inferential tests, the factorial ANOVA and the multiple regression, concurred that gender was significantly related to self-assessment mean bias. These findings confirm earlier research about the relationship between gender and self-assessment (Pallier, 2003; van Kraayenoord & Paris, 1997; Vermeer et al., 1994; Wright & Houck, 1995), although not all of these studies focused on mathematics. Only one small study dealt with elementary-aged students (van Kraayenoord & Paris, 1997), and it was not focused on mathematics. For these reasons, this study contributes to the body of knowledge by focusing on self-assessment in the area of mathematical problem solving and by researching an understudied population. The findings related to gender imply that, regardless of ability or actual performance, boys will walk away from a mathematical problem-solving activity with more confidence than will girls. If a teacher were to desire to predict how accurately his/her students would be able to assess their own mathematical problem-solving work, he/she should look first to students reading ability. This is an interesting finding, because teachers of mathematics would likely assume that accuracy of self308

PAGE 324

assessment would vary primarily according to mathematics ability. However, the multiple regression analysis did not find a significant relationship between mathematics ability and self-assessment mean bias. An important phase of the problem-solving process is looking back to assess the reasonableness of the answer. In fact, the most enduring problem-solving heuristic of all time (Polya, 1957) includes self-assessment as the fourth of four problem-solving steps. The only two studies that have examined the relationship between self-assessment and performance found that simply asking students to assess their own work led to improved performance (Hassmen & Hunt, 1994; Koivula et al., 2001). It is during the self-assessment process that students are able to compare their answer to the results they may have expected, allowing them to detect any potential computational or logical errors. Because of the importance of self-assessment, and the fact that reading ability is such a strong predictor of the accuracy of students self-assessment, these results serve to further highlight the importance of reading in the mathematics classroom. Recommendations for Practice The results of this study, coupled with the understandings provided in existing research, lead to some recommendations for teachers and teacher educators. As discussed in the review of literature, the language, vocabulary, and symbols used in mathematics are unique among the content areas. For this reason, teachers of mathematics are better equipped to teach the reading of 309

PAGE 325

mathematics than are regular reading teachers. The descriptive results for problem-solving processes clearly show that students lack of comprehension of the problem statement was negatively related to performance. The results for mathematical performance, self-efficacy, and self-assessment all confirm the strength of the relationship between students reading ability and their mathematical problem-solving behaviors. Students who are better readers score higher, are more accurate in their feelings of self-efficacy, and are better assessors of their own work. These results serve as an impetus for mathematics teachers to alter the view of their role to include teacher of reading of mathematics and devote more of their classroom instructional time to helping students become more competent readers of mathematics. Lest all the blame be placed on the shoulders of teachers, these results further suggest that undergraduate programs in mathematics teacher education, be they at the elementary or secondary level, must ensure that their preservice teachers are fully prepared to be teachers of mathematics reading. To provide a full preparation requires more than just supplying the requisite pedagogical content knowledge; it also requires that teacher educators adopt new attitudes and beliefs about their role as mathematics reading teachers. Although the addition of a content area reading course to some secondary mathematics education programs is a positive indicator of change, the addition of a single course may lull program designers into believing they have addressed the issue. However, because the reading of mathematics is critical to all problem solving endeavors, regardless of the content strand, so too should instruction in reading 310

PAGE 326

mathematics be woven through all mathematical content in preservice teacher preparation programs. Lorties (1975) theory of the apprenticeship of observation serves as a caution to teachers about their tendency to teach as they were taught. The cycle will not be broken until teacher educators take it upon themselves to become educated about the role of reading in their content area and make that new knowledge, and the resulting beliefs, a central tenet of all courses in their teacher education programs. The current system of reporting a students high-stakes mathematics test score as a single number may be misleading. The results of this study show that when mathematical problems are presented in context, a students reading ability is significantly related to his/her ability to complete the problem successfully. For this question format, reading ability is a confounding variable and may lead to spurious conclusions regarding a students remediation needs. Contextualized problems are an important element of mathematics assessment and should remain a part of high stakes tests. However, to increase the utility of the score reporting and teachers ability to remediate students having difficulty, perhaps students performance on contextualized story problems should also be reported separately. If a students scores for this problem type are lower than her/his scores for non-contextualized problems, and he/she also scored low for reading comprehension, intensive support for reading comprehension may serve to improve both areas. In this case, time spent on mathematical computation remediation may be a wasted effort. 311

PAGE 327

The results of this study related to gender confirm that gender differences in overall performance are not significant in the elementary grades. However, gender differences exist in other facets of problem solving. The inferential tests confirmed that regardless of ability or actual performance, boys commence problem-solving activities with much stronger feelings of self-efficacy than girls. Because self-efficacy is positively correlated with persistence (Multon et al., 1991), boys high level of efficacy may give them an advantage over girls for difficult tasks. This may further explain why boys outscored girls on the most difficult items. Armed with this knowledge, teachers should monitor girls self-efficacy during problem-solving tasks and seek ways to encourage them to persist through challenges. These differences in self-efficacy may also be an important teacher focus when preparing students to take high-stakes tests. These tests tend to be composed of questions covering a wide range of difficulty. All students, but especially girls, should be coached in the test format and testing strategies, including ways to handle difficult questions. If girls are reminded that a difficult question may be followed by an easy question, they may be more inclined to persist through failure. Knowing that girls tend to be less confident, teachers should regularly seek ways to reaffirm girls confidence in what they know and can do. The gender results for self-assessment indicate that girls are more accurate in their self-assessments than boys. The results of previous research show that self-assessment is a vital element of the problem solving process and 312

PAGE 328

is positively related to performance. Self-assessment has received very little research attention, it is not prevalent in mathematics curricula, and it is not, in the researchers experience, a major topic of discussion in either the elementary classroom or the preservice teacher classroom. The results of this study indicate that because boys and girls differ greatly in the accuracy of their self-assessment, and because there was a moderately strong correlation between the accuracy of students self-assessments and their mathematical problem-solving performance, more effort needs to be made to provide opportunities for self-assessment in the classroom. If students are frequently asked to think about the quality of their work, they may become more reflective in their thinking and be more apt to uncover and remedy any errors they have made. For high-stakes testing, students should be provided with enough experience in using the actual scoring rubrics to assess their own work and the work of others that they fully understand the qualities of superior problem-solving work. They will then be equipped to monitor, assess, and edit their own work in a testing environment. The results related to mathematics ability were largely expected. Higher-ability students performed significantly better, provided more accurate and complete linguistic explanations of their problem-solving processes, and were more accurate in their self-efficacy and self-assessment ratings. These results mirror previous research findings and do not contribute to the body of knowledge other than to reaffirm that students general mathematics ability is reflected in all areas of their problem-solving performance. Teachers should be reminded that 313

PAGE 329

their lower-level mathematics students will need greater support not only for the computational aspects of problem solving, but also for clearly explaining their thinking, and accurately gauging their self-efficacy and self-assessment. The recommendations that are directed to mathematics teacher educators about the inclusion of literacy in mathematics instruction and the nurturing of a variety of problem-solving processes also apply to regular classroom teachers. Admittedly, these recommendations may be received by classroom teachers as unrealistic requests to add even more material to an already overcrowded curriculum. In todays high-stakes testing environment, many teachers are completely overwhelmed by the standards they are to cover and the material their students must know. Also, in some school districts, teachers have little or no flexibility in determining the mathematics content they must teach each day or the materials that must be used. How then are they to integrate more literacy into mathematics or encourage a variety of problem solving processes? A potential answer lies in issue of breadth vs. depth in the mathematics curriculum. Without realizing the impact of their traditional pedagogy, many teachers begin their math lesson by teaching all of the sample problems, demonstrating how to solve each type, then hoping that their students will remember the steps in their proper sequence. The teaching portion of the mathematics time is often followed by a time for students to practice what they have learned. At no point in this type of lesson are students required to read any mathematical text to comprehend a new concept or determine what type of problem they are being asked to solve. At no point are students asked to think about how they might use 314

PAGE 330

their prior knowledge in an attempt to solve a new problem. Finally, this lesson format does not provide students with an opportunity to communicate their thinking, either orally or in writing. The outcome of this pedagogy may be the development of a classroom of mathematical imitators, rather than mathematical thinkers and problem solvers. The risk is that this pedagogy may be sacrificing depth for breadth with the result that students gain a wide range of low-level knowledge that is fragile, in that it is potentially damaged rather than strengthened by the acquisition of new knowledge, and cannot be applied to novel problem-solving tasks. A more conceptually based pedagogical alternative sacrifices breadth for depth, but can result in students securing a narrow breadth of high-level understanding that provides a strong foundation for future learning and can be flexibly applied to novel problem situations. Teachers who lament that they cannot possibly teach everything that is on a high-stakes test are justified in their concerns. It is unfeasible for any teacher to teach students all the possible problems they may encounter on a comprehensive test. Also, if students are not routinely expected to tackle mathematical text, new concepts, and mathematical communication on their own, they will not be equipped to handle these tasks on a high-stakes test. As hopeless as the situation may appear, there is a potential solution. Even in the most strictly controlled environment where teachers are told what pages of the mathematics text to teach each day of the week, they can create their own space by using pedagogy that scaffolds conceptual development of the required content, along with students ability to read mathematical text and write 315

PAGE 331

about their mathematical thinking. A conceptual approach to mathematics instruction integrates all of the literacy skills of reading, writing, listening, and speaking, and nurtures the development of mathematical thinkers. A conceptually based lesson might begin by having students brainstorm what they already know that may help them understand the days new concept. They could then read the mathematics text that explains the new concept either alone, in pairs, or in groups. Students may then be given a few minutes to discuss the reading in groups or as a class, relating the new concept(s) back to their prior knowledge. The teacher could then pose a problem that involves the new concept and give students time to work on the problem. As the teacher circulates around the room, he/she can note which students have made significant headway in solving the problem, and ask a few students who have used different solution methods to present their work on the board. As each student presents and explains his/her work, the teacher can guide the discussion through questioning strategies to ensure that the student has explained not just the how of their work, but the why of their work. All solution methods that utilize sound mathematical logic should be valued equally. Students can then practice with a few new problems using whatever method(s) support their conceptual understanding. The lesson can conclude with a journal writing assignment wherein each student either explains whose solution method he/she liked the best and why, or explains the new concept in everyday language. As described, this lesson covers the same content as the more traditional approach, with less breadth but more depth. Students do not have time to solve 316

PAGE 332

as many problems, but they have the opportunity to link their new understanding to prior knowledge, reducing the amount of knew knowledge and practice required. The conceptual approach allows students to communicate about the mathematics, integrating reading, writing, listening, and speaking into the lesson. The conceptual approach promotes and values childrens different ways of thinking instead of expecting that all children should think and solve problems the same way. Finally, by teaching conceptually, teachers are preparing their students to be able to independently read a novel problem, think about how their prior knowledge can be used to solve the problem, then complete the work and communicate their answer. Essentially, conceptual teaching allows teachers to work within the confines of their school districts curriculum while helping to prepare students to be mathematical thinkers and problem solvers who can be successful on high-stakes tests. Recommendations for Future Research As much as this study answers some questions about childrens problem-solving processes, it leads to new questions. Previous research shows distinct differences among boys and girls in the early elementary years that were detected with samples only a third the size of the sample for this study. If the lack of significant findings in the area of problem-solving processes for this study was the result of the study being underpowered, no conclusions can be drawn. However, if the results indicate that the differences that exist among young 317

PAGE 333

students decline over time and are no longer detectable by the fifth grade, the implications would be enormous. A retrospective power analysis was conducted to determine whether the non-significant findings were more likely the result of the study being underpowered or a reflection of the fact that no significant differences exist in the population. Because there were 13 chi-square tests, in order to maintain the overall alpha level at .05, the Bonferroni adjustment was applied, resulting in an alpha of .0038 for each test. For the study to have a power of .80 to detect differences at this alpha level, the study would need 1523 boy and 1279 girl participants (Fleiss, 1981). The lack of significant differences was likely the result of the sample size being too small. Although the differences found in the primary grades by other researchers may no longer exist at the fifth-grade level, further research is needed to draw any conclusions. A longitudinal study could follow students through their elementary years and monitor the strength of their gender-related differences in problem-solving process preferences as they progress through the system. A replication of this study with a much larger sample would also help to clarify the extent to which gender differences in problem-solving processes exist in the fifth grade. Although the data from this study could not clarify the extent to which the unsuccessful response themes of faulty mathematical logic or incoherent responses were related to reading, the data showed that reading issues were strongly related to students performance on the problem-solving test. Further research that is designed to incorporate follow-up interviews with all students 318

PAGE 334

whose responses are coded with these themes may help to illuminate the full extent of the relationship between reading difficulties and problem-solving performance. This study found that among the three independent variables, reading ability had the strongest relationship to students linguistic explanation scores. This research cannot, however, delineate what elements of the reading process are most strongly related to students ability to explain their mathematical problem-solving processes. Because this study was not experimental in nature, it cannot shed any light on what pedagogical practices for reading would have the greatest impact on students linguistic explanations. Additional research is needed in both of these areas. The students in this sample tended to be over-confident in their feelings of self-efficacy, with boys being more over-confident than girls. Previous research has shown that self-efficacy is positively related to performance, yet this study found that students who scored higher mathematically tended to be more accurate in their feelings of self-efficacy. Future research is needed to determine whether there is a causal relationship between self-efficacy bias and performance. Is there a point at which a students level of over-confidence begins to impede performance? Is it better for a student to be overly optimistic about his/her ability to be successful on a task than to be more realistic? Would students benefit from routinely reporting their self-efficacy at the onset of mathematical tasks and checking that against their actual performance? 319

PAGE 335

Similar questions remain unanswered for self-assessment. The act of self-assessment has been previously shown to improve performance, but no studies have clarified the relationship between the accuracy of the self-assessments and the impact on performance. The moderate correlation between the two variables found in this study indicates that a relationship exists, but the correlation alone does not clearly define the relationship. Also, because this study was non-experimental, it does not provide evidence as to what classroom activities would prove to be the most beneficial for improving the accuracy of students self-assessments. Self-assessment is an understudied phase of the problem-solving process and deserves more attention so that teachers can be sufficiently equipped to guide their students to becoming competent assessors of their own work. For some of the outcome variables, the descriptive results show differences between groups that the inferential tests did not find to be significant. A replication of this study with a similar but larger sample would increase the power of the study to detect smaller differences and smaller effect sizes. If the new results were added to the results of this study, the researcher could more clearly interpret whether non-significant results were indicative of the lack of differences in the population. Finally, the test items used for data collection for this study were limited to the mathematics standard of number and operations. Therefore, the results of this study cannot be generalized across all mathematics content. A replication of this study using test items that cover other mathematics standards would add to 320

PAGE 336

these findings about the relationships among the independent and dependent variables. 321

PAGE 337

REFERENCES Ackerman, P. T., & Dykman, R. A. (1995). Reading-disabled students with and without comorbid arithmetic disability. Developmental Neuropsychology, 11, 351-371. Adams, T. L. (2003). Reading mathematics: More than words can say. The Reading Teacher, 56, 786-795. Ai, X. (2002). Gender differences in growth in mathematics achievement: Three-level longitudinal and multilevel analyses of individual, home, and school influences. Mathematical Thinking and Learning, 4, 1-22. Alexander, P. (1995). Superimposing a situation-specific and domain-specific perspective on an account of self-regulated learning. Educational Psychologist, 30, 189-193. Alloway, N., & Gilbert, P. (1997). Boys and literacy: Lessons from Australia. Gender and Education, 9, 49-58. American Association of University Women (2002). Course-taking patterns. In The Jossey-Bass reader on gender in education (pp. 277-300). San Francisco, CA: Jossey-Bass. American Psychological Association (1985). Standards for educational and psychological testing. Washington, DC: Author. 322

PAGE 338

Anderson, R. C., Pichert, J. W., Goetz, E. T., Schallert, D. L., Stevens, K. V., Trollip, S. R. (1976). Instantiation of general terms. Journal of Verbal Learning and Verbal Behavior, 15, 667-679. Artzt, A. F., & Armour-Thomas, E. (1992). Development of a cognitive-metacognitive framework for protocol analysis of mathematical problem solving in small groups. Cognition and Instruction, 9, 137-175. Ary, D., Jacobs, L. C., & Razavieh, A. (1996). Introduction to research in education (5 th ed.). Fort Worth, TX: Harcourt Brace College Publishers. Baker, D. P., & Jones, D. P. (1993). Creating gender-equality: Cross-national gender stratification and mathematical performance. Sociology of Education, 66, 1-103. Bandura, A. (1986). Social foundations of thought and action: A social cognitive theory. Englewood Cliffs, NJ: Prentice Hall. Bandura, A. (1997). Self-efficacy: The exercise of control. New York: Freeman. Barrs, M. (2000). Gendered literacy? Language Arts, 77, 287-293. Barton, M. L., Heidema, C., & Jordan, D. (2002). Teaching reading in mathematics and science. Educational Leadership, 60(3), 24-28. Basurto, I., (1999). Conditions of reading comprehension which facilitate word problems for second language learners. Reading Improvement, 36, 142-148. Bernardo, A. B. (1999). Overcoming obstacles to understanding and solving word problems in mathematics. Educational Psychology, 19, 19-163. 323

PAGE 339

Betz, N. E., & Hackett, D. (1983). The relationship of mathematics career-related self-efficacy expectations to the selection of science-based college majors. Journal of Vocational Behavior, 23, 329-345. Boekaerts, M., Seegers, G., & Vermeer, H. J. (1995). Solving math problems: Where and why does the solution process go astray? Educational Studies in Mathematics, 28, 241-262. Boggs, M. L. (1999). Reading boys: A poststructuralist inquiry into first-grade literacy and masculinity during the language arts block. Unpublished doctoral dissertation, University of South Florida, Tampa. Bong, M. (2002). Predictive utility of subject-, task-, and problem-specific self-efficacy judgments for immediate and delayed academic performances. The Journal of Experimental Education, 70, 133-162. Bottge, B. A. (1999). Effects of contextualized math instruction on problem solving of average and below-average achieving students. The Journal of Special Education, 33, 81-92. Borasi, R., Siegel, M., & Fonzi, J. M. (1998). Using transactional reading strategies to support sense-making and discussion in mathematics classrooms: An exploratory study. Journal for Research in Mathematics Education, 29, 275-305. Braten, I. (1998). Cognitive strategies in mathematics, part 1: On childrens strategies for solving simple addition problems. Scandinavian Journal of Educational Research, 42, 5-24. 324

PAGE 340

Bratina, T. A., & Lipkin, L. J. (2003). Watch your language! Recommendations to help students communicate mathematically. Reading Improvement, 40, 3-12. Breland, H. M., Danos, D. O., Kahn, H. D., Kubota, M. Y., & Bonner, M. W. (1994). Performance versus objective testing and gender: an exploratory study of an advanced placement history examination. Journal of Educational Measurement, 31, 275-293. Burns, M. (1995). Writing in math class? Absolutely! Instructor, 104, 40-47. Capps, L. R. & Pickreign, J. (1993). Language connections in mathematics: A critical part of mathematics instruction. Arithmetic Teacher, 41, 8-12. Cardelle-Elawar, M. (1994). Effects of metacognitive instruction on low achievers in mathematics problems. Teaching and Teacher Education, 11, 81-95. Carr, M., & Davis, H. (2001). Gender differences in arithmetic strategy use: A function of skill and preference. Contemporary Educational Psychology, 26, 330-347. Carr, M., Jessup, D. L., & Fuller, D. (1999). Gender differences in first-grade mathematics strategy use: Parent and teacher contributions. Journal for Research in Mathematics Education, 30, 20-46. Charles, R., Lester, F., & ODaffer, P. O. (1988). How to evaluate progress in problem solving. Reston, VA: National Council of Teachers of Mathematics. 325

PAGE 341

Charmaz, K. (2000). Grounded theory: Objectivist and constructivist methods. In N. K. Denzin & Y. S. Lincoln (Eds.), Handbook of qualitative research (pp. 509-535). Thousand Oaks, CA: Sage. Chipman, S. F. (1996). Female participation in the study of mathematics, the US situation. In G. Hanna (Ed.), Towards gender equity in mathematics education (pp. 285-296). Netherlands: Kluwer Academic Publishers. Chipman, S. F., Marshall, S. P., & Scott, P. A., (1991). Content effects on word problem performance: A possible source of test bias? American Educational Research Journal, 28, 897-915. Choi, J., & Hannafin, M. (1997). The effects of instructional context and reasoning complexity on mathematics problem-solving. Educational Technology Research & Development, 45, 43-55. Ciochine, J. G., & Polivka, G. (1997). The missing link? Writing in mathematics class! Mathematics Teaching in the Middle School, 2, 316-320. Clarkson, S. P., & Williams, W. H. (1994, November). Are you assessing reading or mathematics? Paper presented at the Annual Meeting of the American Mathematics Association of Two-Year Colleges, Tulsa, OK. Cohen, J. (1977). Statistical power analysis for the behavioral sciences. New York: Academic Press. Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155-159. Commeyras, M. (1999). How interested are literacy educators in gender issues? Survey results from the United States. Journal of Adolescent and Adult Literacy, 42, 352-362. 326

PAGE 342

Conlin, M. (2003). The new gender gap: From kindergarten to grad school, boys are becoming the second sex. Business Week, May 26, 74-82. Cook, J. (1995). Integrating math and writing. Teaching PreK-8, 25, 22-23. Cooper, S. E., & Robinson, D. A. G. (1991). The relationship of mathematics self-efficacy beliefs to mathematics anxiety and performance. Measurement and Evaluation in Counseling and Development, 24, 4-11. County School Board (2004). Elementary school FCAT scores: 2002-2003 school accountability report. Retrieved May 7, 2004, from county school board website. Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297-334. Croom, L. (1997). Mathematics for all students: Access, excellence, and equity. In J. Trentacosta & M. J. Kenney (Eds.), Multicultural and gender equity in the mathematics classroom: 1997 yearbook (pp. 1-9). Reston, VA: National Council of Teachers of Mathematics. Crowley, M. L. (1990). Criterion-referenced reliability indices associated with the van Hiele geometry test. Journal for Research in Mathematics Education, 21, 238-241. Dale, E., & Chall, J. L. (1949). A formula for predicting readability. Educational Research Bulletin, 27, 11-20, 28. Desoete, A., Roeyers, H., & Buysse, A. (2001). Metacognition and mathematical problem solving in grade 3. Journal of Learning Disabilities, 34, 435-449. Dobson, J. (2001). Bringing up boys. Wheaton, IL: Tyndale. 327

PAGE 343

Dowling, D. M. (1978). The development of a mathematics confidence scale and its application in the study of confidence in women college students. Unpublished doctoral dissertation, Ohio State University, Columbus. Drake, B. M., & Amspaugh, L. B. (1994). What writing reveals in mathematics. Focus on Learning Problems in Mathematics, 16, 43-50. Draper, R. J. (2002). School mathematics reform, constructivism, and literacy: A case for literacy instruction in the reform-oriented math classroom. Journal of Adolescent and Adult Literacy, 45, 520-529. Dusterhoff, M. (1995). Why write in mathematics? Teaching PreK-8, 25, 48-49. Earp, N. W. & Tanner, F. W. (1980). Mathematics and language. Arithmetic Teacher, 28, 32-34. Eccles, J., Wigfield, A., Harold, R. D., & Blumenfeld, P. (1993). Age and gender differences in childrens selfand task perceptions during elementary school. Child Development, 64, 830-847. Ethington, C. A. (1992). Gender differences in a psychological model of mathematics achievement. Journal for Research in Mathematics Education, 23, 166-181. Fan, N., Mueller, J. H., & Marini, A. E. (1994). Solving difference problems: Wording primes coordination. Cognition and Instruction, 12, 355-369. Farrand, M. R. (2002). Gender and achievement in mathematical problem solving and attributions for mathematical achievements in grade three students. Unpublished doctoral dissertation, University of Windsor, Canada. 328

PAGE 344

FCAT results: Ranking local schools. (2004, May 11). The Bradenton Herald, p. A8. Feinberg, A. B., Shapiro, E. S. (2003). Accuracy of teacher judgments in predicting oral reading fluency. School Psychology Quarterly, 18, 52-65. Fennema, E. (1974). Mathematics learning and the sexes: A review. Journal for Research in Mathematics Education, 5, 126-139. Fennema, E., & Carpenter, T. P. (1981). Sex-related differences in mathematics: Results from the National Assessment. Mathematics Teacher, 74, 554-559. Fennema, E., Carpenter, T. P., Jacobs, V. R., Franke, M. L., & Levi, L. W. (1998). A longitudinal study of gender differences in young childrens mathematical thinking. Educational Researcher, 27(5), 6-11. Fierros, E. G. (1999, April). Examining gender differences in mathematics achievement on the Third International Mathematics and Science Study. Paper presented at the Annual Meeting of the American Educational Research Association, Montreal, Quebec, Canada. Fleiss, J. L. (1981). Statistical Methods for Rates and Proportions (2 nd ed.). New York: John Wiley & Sons. Flem, A., & Sovik, N. (1993). Children with motor coordination problems and learning disabilities in reading, spelling, writing and arithmetic. European Journal of Special Needs Education, 8, 81-98. Flesch, R. (1948). A new readability yardstick. Journal of Applied Psychology, 30, 221-233. 329

PAGE 345

Forgasz, H. J., & Leder, G. C. (1996). Mathematics classrooms, gender and affect. Mathematics Education Research Journal, 8, 153-173. Fortescue, C. M. (1994). Using oral and written language to increase understanding of math concepts. Language Arts, 71, 576-580. Fouad, N. A., Smith, P. L., & Enochs, L. (1997). Reliability and validity evidence for the middle school self-efficacy scale. Measurement and Evaluation in Counseling and Development, 30, 17-31. Fry, E. B. (1969). The readability graph validated at primary levels. Reading Teacher, 22, 534-538. Fry, E. (1990). A readability formula for short passages. Journal of Reading, 33, 594-597. Fuchs, L. S., & Fuchs, D. (2002). Mathematical problem-solving profiles of students with mathematics disabilities with and without comorbid reading disabilities. Journal of Learning Disabilities, 35, 563-573. Fuchs, L. S., Fuchs, D., Eaton, S. B., Hamlett, C., & Karns, K. (2000). Supplementing teacher judgments of mathematics test accommodations with objective data sources. School Psychology Review, 29, 65-85. Fuchs, L. S., Fuchs, D., Prentice, K., Burch, M., Hamlett, C. L., Owen, R., & Schroeder, K. (2003). Enhancing third-grade students mathematical problem solving with self-regulated learning strategies. Journal of Educational Psychology, 95, 306-315. Fuentes, P. (1998). Reading comprehension in mathematics. The Clearing House, 72, 81-88. 330

PAGE 346

Gabrielson, S., Gordon, B., & Engelhard, G. (1995). The effects of task choice on the quality of writing obtained in a statewide assessment. Applied Measurement in Education, 8, 273-290. Gallagher, A. M., & DeLisi, R. (1994). Gender differences in scholastic aptitude test: Mathematics problem solving among high-ability students. Journal of Educational Psychology, 86, 204-211. Gallagher, A. M., DeLisi, R., Holst, P. C., McGillicuddy-DeLisi, A. V., Morely, M., & Cahalan, C. (2000). Gender differences in advanced mathematical problem solving. Journal of Experimental Child Psychology, 75, 165-190. Gambell, T. J., & Hunter, D. M. (1999). Rethinking gender differences in literacy. Canadian Journal of Education, 24, 1-16. Gambell, T. J., & Hunter, D. M. (2000). Surveying gender differences in Canadian school literacy. Journal of Curriculum Studies, 32, 689-719. Gee, J. P. (1996). Social linguistics and literacies: Ideology in discourses. Philadelphia: Falmer Press. Glass, G. V., & Hopkins, K. D. (1996). Statistical methods in education and psychology. Boston: Allyn and Bacon. Glass, G., Peckham, P., & Sanders, J. (1972). Consequences of failure to meet assumptions underlying the fixed effects analyses of variance and covariance. Review of Educational Research, 42, 237-288. Goldman, S. R., Hasselbring, T. S. (1997). Achieving meaningful mathematics literacy for students with learning disabilities. Journal of Learning Disabilities, 30, 198-208. 331

PAGE 347

Goldsby, D. S., & Cozza, B. (2002). Writing samples to understand mathematical thinking. Mathematics Teaching in the Middle School, 7, 517-520. Gormley, K. A. (1993, April). Gender and ability differences in childrens writing. Paper presented at the Annual Meeting of the American Educational Research Association, Atlanta, GA. Guilford, J. P., & Fruchter, B. (1978). Fundamental statistics in psychology and education. New York: McGraw-Hill. Gurganus, S., & Del Mastro, M. (1998). Mainstreaming kids with reading and writing problems: Special challenges of the mathematics classroom. Reading and Writing Quarterly, 14, 117-125. Guthrie, J. T. (1981). Reading in New Zealand: Achievement and volume. Reading Research Quarterly, 15, 337-357. Hackett, G. (1985). The role of mathematics self-efficacy in the choice of math-related majors of college women and men: A path analysis. Journal of Counseling Psychology, 32, 47-56. Hackett, G., & Betz, N. E. (1989). An exploration of the mathematics self-efficacy/mathematics performance correspondence. Journal for Research in Mathematics Education, 20, 261-273. Halpern, D. (1992). Sex differences in cognitive abilities. Hillside, NJ: Lawrence Erlbaum. Halpern, D. F., & LaMay, M. L. (2000). The smarter sex: A critical review of sex differences in intelligence. Educational Psychology Review, 12, 229-246. 332

PAGE 348

Hanlon, E. H., & Schneider, Y. (1999, April). Improving math proficiency through self-efficacy training. Paper presented at the Annual Meeting of the American Educational Research Association, Montreal. Hartman, M. (1994). Making sense of math through writing. Perspectives in Education and Deafness, 12, 6-9. Hassmen, P., & Hunt, D. P. (1994). Human self-assessment in multiple-choice testing. Journal of Educational Measurement, 31, 149-160. Hater, M. A., & Kane, R. B. (1975). The cloze procedure as a measure of mathematical English. Journal for Research in Mathematics Education, 6, 121-127. Hebert, J. C., Carpenter, T. P., Fennema, E., Fuson, K. C., Human, P. G., Murray, H. G., Olivier, A. I., & Wearne, D. (1996). Problem solving as a basis for reform in curriculum and instruction: The case of mathematics. Educational Researcher, 25, 12-21. Hecht, S. A., Torgeson, J. K., Wagner, R. K., & Rashotte, C. A. (2001). The relations between phonological processing abilities and emerging individual differences in mathematical computation skills: a longitudinal study from second to fifth grades. Journal of Experimental Child Psychology, 79, 192-227. Hegarty, M., Mayer, R. E., & Monk, C. A. (1995). Comprehension of arithmetic word problems: A comparison of successful and unsuccessful problem solvers. Journal of Educational Psychology, 87, 18-32. 333

PAGE 349

Helwig, R., Anderson, L., & Tindal, G. (2001). Influence of elementary student gender on teachers perceptions of mathematics achievement. The Journal of Educational Research, 95, 93-102. Helwig, R., Rozek-Tedesco, M. A., Tindal, G., Heath, B., & Almond, P. (1999). Reading as an access to mathematics problem solving on multiple-choice tests for sixth-grade students. Journal of Educational Research, 93, 113-125. Helwig, R., Rozek-Tedesco, M. A., & Tindal, G. (2002). An oral versus a standard administration of a large-scale mathematics test. The Journal of Special Education, 36, 39-47. Hildebrand, C., Ludeman, C. J., & Mullin, J. (1999). Integrating mathematics with problem solving using the mathematicians chair. Teaching Children Mathematics, 5, 434-441. Hill, P. W., & Rowe, K. J. (1998). Modeling student progress in studies of educational effectiveness. School Effectiveness and School Improvement, 7, 310-333. Homan, S., Hewitt, M., & Linder, J. (1994). The development and validation of a formula for measuring single-sentence test item readability. Journal of Educational Measurement, 31, 349-358. Hong, E. (1995). Mental models in word problem-solving: A comparison between American and Korean sixth-grade students. Applied Cognitive Psychology, 9, 123-142. 334

PAGE 350

Hunsader, P. D. (2002a). Mathematical language: Comprehension, readability, and testing. Florida Reading Quarterly, 39(2), 6-10. Hunsader, P. D. (2002b). Why boys fail and what we can do about it. Principal, 82(2), 52-54. Hyde, J. S., Fennema, E., & Lamon, S. J. (1990). Gender differences in mathematics performance: A meta-analysis. Psychological Bulletin, 107, 139-155. Jackson, D. (1998). Breaking out of the binary trap: Boys underachievement, schooling, and gender relations. In E. Epstein, J. Elwood, V. Hey, & J. Maw (Eds.), Failing boys? Issues in gender and achievement (pp. 77-95). Philadelphia: Open University Press. Jitendra, A. K., & Xin, Y. P. (1997). Mathematical word-problem-solving instruction for students with mild disabilities and students at risk for math failure: A research synthesis. The Journal of Special Education, 30, 412-438. Johnson, E. S. (2000). The effects of accommodations on performance assessments. Remedial and Special Education, 21, 261-267. Johnson, T. M., Jones, G. A., Thornton, C. A., Langrall, C. W., & Rous, A. (1998). Students thinking and writing in the context of probability. Written Communication, 15, 203-229. John-Steiner, V. (1989). Is mathematics a language? In P. Connolly and T. Vilardi (Eds.), Writing to learn mathematics and science, (pp. 285-289). Teachers College Press: New York. 335

PAGE 351

Jones, C. J. (2001). CBAs that work: Assessing students math content-reading levels. Teaching Exceptional Children, 34, 24-28. Jordan, N. C., & Hanich, L. B. (2000). Mathematical thinking in second-grade children with different forms of LD. Journal of Learning Disabilities, 33, 567-578. Jordan, N. C., Kaplan, D., & Hanich, L. B. (2002). Achievement growth in children with learning difficulties in mathematics: Findings of a two-year longitudinal study. Journal of Educational Psychology, 94, 586-597. Jordan, N. C., & Montani, T. O. (1997). Cognitive arithmetic and problem solving: A comparison of children with specific and general mathematics difficulties. Journal of Learning Disabilities, 30, 624-634. Junge, M. E., & Dretzke, B. J. (1995). Mathematical self-efficacy gender differences in gifted/talented adolescents. Gifted Child Quarterly, 39, 22-28. Kahneman, D., & Tversky, A. (1996). On the reality of cognitive illusions. Psychological Review, 103, 582-591. Kane, R. B. (1970). The readability of mathematics textbooks revisited. Mathematics Teacher, 63, 579-581. Kang, H., & Pham, K. T. (1995, March). From 1 to Z: Integrating math and language learning. Paper presented at the 29 th annual convention of the Teachers of English to Speakers of Other Languages, Long Beach, CA. Kelly, R. R., Mousley, K. (2001). Solving word problems: More than reading issues for deaf students. American Annals of the Deaf, 146, 251-262. 336

PAGE 352

Kenyon, R. W. (1989). Writing is problem solving. In P. Connolly and T. Vilardi (Eds.), Writing to Learn Mathematics and Science, (pp. 73-94). Teachers College Press: New York. King, J. R. (in press). What can he want? Male teachers, young children, and teaching desire. In W. Martino and M. Mills (Eds.), Masculinities and schooling. Mahwah, NJ: Earlbaum. Kiplinger, V. L., Haug, C. A., & Abedi, J. (2000, April). Measuring mathnot readingon a math assessment: A language accommodations study of English language learning and other special populations. Paper presented at the annual meeting of the American Educational Research Association, New Orleans, LA. Knudson, R. E. (1995). Writing experiences, attitudes, and achievement of first to sixth graders. Journal of Educational Research, 89 (2), 90-97. Koivula, N., Hassmen, P., & Hunt, D. P. (2001). Performance on the Swedish scholastic aptitude test: Effects of self-assessment and gender. Sex Roles, 44, 629-633. Kramarski, B., Mevarech, Z. R., & Arami, M. (2002). The effects of metacognitive instruction on solving mathematical authentic tasks. Educational Studies in Mathematics, 49, 225-250. Kroll, L., & Halaby, M. (1997). Writing to learn mathematics in the primary school. Young Children, 52, 54-60. Krulik, S., & Rudnick, J. A. (1987). Problem solving: A handbook for teachers. (2 nd ed.). Boston: Allyn and Bacon. 337

PAGE 353

Krulik, S., & Rudnick, J. A. (1994). Reflectfor better problem solving and reasoning. Arithmetic Teacher, 41, 334-335. Krulik, S., & Rudnick, J. A. (1996). The new sourcebook for teaching reasoning and problem solving in junior and senior high school. Boston: Allyn and Bacon. Landi, M. A. G. (2001). Helping students with learning disabilities make sense of word problems. Intervention in School and Clinic, 37, 13-18, 30. Lane, S., Wang, N., & Magone, M. (1996). Gender-related differential item functioning on a middle-school mathematics performance assessment. Educational Measurement, 15 (4), 21-27. Laveault, D., Leblanc, R., & Leroux, J. (1999, August). Self-regulated learning of young adolescents in a mathematics activity. Paper presented at the Annual Conference of the American Psychological Association, Boston, MA. Leahey, E., & Guo, G. (2001) Gender differences in mathematical trajectories. Social Forces, 80, 713-732. LeBlanc, J. F., Proudfit, L., & Putt, I. J. (1980). Teaching problem solving in the elementary school. In S. Krulik & R. E. Reys (Eds.), Problem solving in school mathematics: 1980 NCTM yearbook, (pp. 104-116). Reston, VA: National Council of Teachers of Mathematics. Leder, G. C. (1990). Gender differences in mathematics: An overview. In E. Fennema & G. C. Leder (Eds.), Mathematics and gender (pp. 10-26). New York: Teachers College Press. 338

PAGE 354

Leinhardt, G., & Schwartz, B. B. (1997). Seeing the problem: An explanation from Polya. Cognition and Instruction, 15, 395-434. Lemke, M., Calsyn, C., Lippman, L., Jocelyn, L., Kastberg, D., Liu, T., Roey, S., Williams, T., Kruger, T., & Bairu, G. (2001). Outcomes of learning: Results from the 2000 Program for International Student Assessment of 15-year-olds in reading, mathematics, and science literacy. Washington, DC: U. S. Department of Education, National Center for Education Statistics. Lent, R. W., Brown, S. D., & Gore, P. A. (1997). Discriminant and predictive validity of academic self-concept, academic self-efficacy, and mathematics-specific self-efficacy. Journal of Counseling Psychology, 44, 307-315. Leong, C. K., & Jerred, W. D. (2001). Effects of consistency and adequacy of language information on understanding elementary mathematics word problems. Annals of Dyslexia, 51, 277-298. Lichtenstein, S., & Rischhoff, B. (1977). Do those who know more also know more about how much they know? Organizational Behavior and Human Performance, 20, 159-183. Lundeberg, M. A., Fox, P. W., & Puncochar, J. (1994). Highly confident but wrong: Gender differences and similarities in confidence judgments. Journal of Educational Psychology, 86, 114-121. 339

PAGE 355

Maccoby, E. E. (1966). Sex differences in intellectual functioning. In E. E. Maccoby (Ed.), The development of sex differences. Stanford: Stanford University Press. MacGregor, M. (1990). Reading and writing in mathematics. In J. Bickmore-Brand (Ed.), Langage in mathematics (pp. 100-108). Portsmouth, NH: Heinemann. Maddux, S. (2002). Self-efficacy: The power of believing you can. In C. R. Synder & S. J. Lopez (Eds.), Handbook of positive psychology (pp. 277-287). London: Oxford University Press. Malecki, C. K., & Jewell, J. (2003). Developmental, gender, and practical consideration in scoring curriculum-based measurement writing probes. Psychology in the Schools, 40, 379-390. Malpass, J. R. (1996, April). Self-regulation, goal orientation, self-efficacy, and math achievement. Paper presented at the Annual Meeting of the American Educational Research Association. New York. Manning, M. M. (1999). Building reading skills in math. Teaching PreK-8, 29, 85-86. McCombs, B. L., & Marzano, R. J. (1990). Putting the self in self-regulated learning; The self as agent in integrating will and skill. Educational Psychologist, 25, 51-69. McCoy, L. P. (1994). Mathematical problem-solving processes of elementary male and female students. School Science and Mathematics, 94, 266-270. 340

PAGE 356

McIntosh, M. E. (1997). Guide students to better comprehension of word problems. Clearing House, 71, 26-32. McIntosh, M. E., & Draper, R. J. (1996). Using the question-answer relationship strategy to improve students reading of mathematics texts. Clearing House, 69, 154-162. McLeod, D. B. (1989). The role of affect in mathematical problem solving. In D. B. McLeod & V. M. Adams (Eds.), Affect and mathematical problem solving: A new perspective (pp. 20-36). New York: Springer-Verlag. Michaels, R. (2002). The relationships among problem solving performance, gender, confidence, and attributional style in third-grade mathematics. Unpublished doctoral dissertation, University of San Francisco. Miller, L. D. (1993). Making the connection with language. Arithmetic teacher, 40, 311-316. Montague, M. (1997). Cognitive strategy instruction in mathematics for students with learning disabilities. Journal of Learning Disabilities, 30, 164-177. Morales, R. V. (1998, February). Comprehension and solution patterns of simple math word problems by Mexican-American, bilingual, elementary school students. Paper presented at the annual meeting of the National Association for Bilingual Education, Dallas, TX. Mosenthal, P. B., & Kirsch, I. S. (1993). Profiling students quantitative literacy abilities: An approach with precision. Journal of Reading, 36, 668-674. Mulrine, A. (2001). Are boys the weaker sex? U.S. News and World Report, 131, 40-47. 341

PAGE 357

Multon, K. D., Brown, S. D., & Lent, R. W. (1991). Relation of self-efficacy beliefs to academic outcomes: A meta-analytic investigation. Journal of Counseling Psychology, 38, 30-38. Muth, K. D. (1997). Using cooperative learning to improve reading and writing in mathematical problem solving. Reading and Writing Quarterly, 13, 71-82. National Assessment of Educational Progress (1986). The mathematics report card: Are we measuring up? (ETS Publication No. 17-M-01). Educational Testing Service, Princeton. National Assessment Governing Board (1995). Mathematics framework for the 1996 National Assessment of Educational Progress. Washington, DC: Author. National Center for Education Statistics (2000). Trends in educational equity of girls and women. Washington, DC: U. S. Department of Education. National Center for Education Statistics (2003). National Assessment of Educational Progress Released Items. Retrieved January 16, 2004 from https://www.nces.ed.gov/nationsreportcard/itmrls/printproduce.asp. National Council of Teachers of Mathematics (1989). Curriculum and evaluation standards for school mathematics. Reston, VA: Author. National Council of Teachers of Mathematics (1995). Assessment Standards for School Mathematics. Reston, VA: Author. National Council of Teachers of Mathematics (2000). Principles and standards for school mathematics. Reston, VA: Author. 342

PAGE 358

Neef, N. A., Nelles, D. E., Iwata, B. A., & Page, T. J. (2003). Analysis of precurrent skills in solving mathematics word problems. Journal of Applied Behavior Analysis, 36, 21-33. Newkirk, T. (2002). Misreading masculinity. Portsmouth, NH: Heinemann. OBrien, V., Kopala, M., & Martinez-Pons, M. (1999). Mathematics self-efficacy, ethnic identity, gender, and career interests related to mathematics and science. The Journal of Educational Research, 92, 231-235. Pajares, F. (1996). Self-efficacy beliefs in academic settings. Review of Educational Research, 66, 543-578. Pajares, F., & Miller, M. D. (1994). Role of self-efficacy and self-concept beliefs in mathematical problem solving: A path analysis. Journal of Educational Psychology, 86, 193-203. Pajares, F., & Miller, M. D. (1995). Mathematics self-efficacy and mathematics performance: The need for specificity of assessment. Journal of Counseling Psychology, 42 (2), 190-198. Pajares, F., & Miller, M. D. (1997). Mathematics self-efficacy and mathematical problem solving: Implications of using different forms of assessment. The Journal of Experimental Education, 65, 213-228. Pajares, F., Miller, M. D., & Johnson, M. J. (1999). Gender differences in writing self-beliefs of elementary school students. Journal of Educational Psychology, 91, 50-61. Palardy, J. M. (1998). The effects of teachers expectations on childrens literacy development. Reading Improvement, 35, 184-186. 343

PAGE 359

Pallier, G., Wilkinson, R., Danthir, V., Kleitman, S., Knezivic, G., Stankov, L., & Roberts, R. D. (2002). The role of individual differences in the accuracy of confidence judgments. The Journal of General Psychology, 129, 257-299. Pallier, G. (2003). Gender differences in the self-assessment of accuracy on cognitive tasks. Sex Roles, 48, 265-276. Paris, S. G., & Paris, A. H., (2001). Classroom applications of research on self-regulated learning. Educational Psychologist, 36, 89-101. Park, H., Bauer, S. C., & Sullivan, L. M. (1998). Gender differences among top performing elementary school students in mathematical ability. Journal of Research and Development in Education, 31, 133-141. Passolunghi, M. C., & Siegel, L. S. (2001). Short-term memory, working memory, and inhibitory control in children with difficulties in arithmetic problem solving. Journal of Experimental Child Psychology, 80, 44-57. Patton, M. Q. (2002). Qualitative research and evaluation methods (3 rd ed.). Thousand Oaks, CA: Sage. Pau, C. S. (1995). The deaf child and solving problems of arithmetic: The importance of comprehensive reading. American Annals of the Deaf, 140, 287-290. Pengelly, H. (1990). Acquiring the language of mathematics. In J. Bickmore-Brand (Ed.), Langage in mathematics (pp. 10-26). Carlton South, Victoria, Australia: Heinemann. 344

PAGE 360

Peressini, D., & Bassett, J. (1996). Mathematical communication in students responses to a performance-assessment task. In P. S. Elliott & M. J. Kenney (Eds.), Communications in mathematics, K-12 and beyond: 1996 NCTM yearbook, (pp. 146-158). Reston, VA: National Council of Teachers of Mathematics. Peterson, S. (1998). Evaluation and teachers perceptions of gender in sixth-grade student writing. Research in the Teaching of English, 33, 181-208. Peterson, S. (2000). Grades four and eight students and teachers perceptions of girls and boys writing competencies. Reading Horizons, 40, 253-271. Peterson, S., & Bainbridge, J. (1999). Teachers gendered expectations and their evaluation of student writing. Reading Research and Instruction, 38, 255-271. Phillips, L. M., Norris, S. P., Osmond, W. C., & Maynard, A. M. (2002). Relative reading achievement: A longitudinal study of 187 children from first through sixth grades. Journal of Educational Psychology, 94, 3-13. Pietsch, J., Walker, R., & Chapman, E. (2003). The relationship among self-concept, self-efficacy, and performance in mathematics during secondary school. Journal of Educational Psychology, 95, 589-603. Pintrich, P. R., & Schrauben, B. (1992). Students motivational beliefs and their cognitive engagement in classroom academic tasks. In D. H. Schunk & J. L. Meece (Eds.), Student perceptions in the classroom (pp. 149-183). Hillsdale, NJ: Erlbaum. 345

PAGE 361

Polya, G. (1957). How to solve it. (2 nd ed.). Princeton, NJ: Princeton University Press. Pomplun, M., & Capps, L. (1999). Gender differences for constructed-response mathematics items. Educational and Psychological Measurement, 59, 597-614. Pottorff, D. D., Phelps-Zientarski, D., & Skovera, M. E. (1996). Gender perceptions of elementary and middle school students about literacy at school and home. Journal of Research and Development in Education, 29, 203-209. Powell, A. B. (1997). Capturing, examining, and responding to mathematical thinking through writing. The Clearing House, 71, 21-25. Pressley, M. (1995). More about the development of self-regulation: Complex, long-term, and thoroughly social. Educational Psychologist, 30, 207-212. Pugalee, D. (1997). Connecting writing to the mathematics curriculum. Mathematics Teacher, 90, 308-310. Pugalee, D. (1998). Promoting mathematical learning through writing. Mathematics in School, 27, 20-22. Pugalee, D. (2001). Writing, mathematics, and metacognition: Looking for connections through students work in mathematical problem solving. School Science and Mathematics, 101, 236-245. Quinn, R. J., & Wilson, M. M. (1997). Writing in the mathematics classroom: Teacher beliefs and practices. The Clearing House, 71, 14-20. 346

PAGE 362

Randhawa, B. S., Beamer, J. E., & Lundberg, I. (1993). Role of mathematics self-efficacy in the structural model of mathematics achievement. Journal of Educational Psychology, 85, 41-48. Reed, S. K. (1999). Word problems: Research and curriculum reform. Mahwah, NJ: Lawrence Erlbaum. Reehm, S. P., & Long, S. A. (1996). Reading in the mathematics classroom. Middle School Journal, 27(5), 35-41. Rosenblatt, L. M. (1978). The reader, the text, the poem: The transactional theory of the literary work. Carbondale, IL: Southern Illinois University Press. Rosenholtz, S. J., & Simpson, C. (1984). The formation of ability conceptions: Developmental trend or social construction? Review of Educational Research, 54, 31-63. Routman, R. (1994). Invitations: Changing as teachers and learners, K-12. Portsmouth, NH: Heinemann. Rowe, K. J. (August, 2000). Problems in the education of boys and exploring real effects from evidence-based research: Useful findings in teaching and learning for boys and girls. Background paper of keynote address presented at the Teaching Boys Developing Fine Men Conference, Brisbane, Australia. Ruddell, M. R. (1997). Teaching content area reading and writing (2 nd ed.). Boston: Allyn & Bacon. 347

PAGE 363

Rudnitsky, A., Etheredge, S., Freeman, S. J. M., & Gilbert, T. (1995). Learning to solve addition and subtraction word problems through a structure-plus-writing approach. Journal for Research in Mathematics Education, 26, 467-486. Schell, V. J. (1982). Learning partners: Reading and mathematics. The Reading Teacher, 35, 544-548. Schoenfeld, A. H. (1983). Episodes and executive decisions in mathematical problem solving. In R. Lash & M. Landau (Eds.), Acquisition of mathematics concepts and processes (pp. 345-395). New York: Academic. Schoenfeld, A. H. (1989). Teaching mathematical thinking and problem solving. In L. B. Resnick & L. E. Klopfer (Eds.), Toward the thinking curriculum: Current cognitive research (pp. 83-103). [ASCD Yearbook]. Alexandria, VA: Association for Supervision and Curriculum Development. Schraw, G. (1995). Measures of feeling-of-knowing accuracy: A new look at an old problem. Applied Cognitive Psychology, 9, 321-332. Schunk, D. H. (1989). Social cognitive theory and self-regulated learning. In B. J. Zimmerman and D. Hl. Schunk (Eds.), Self-regulated learning and academic achievement (pp. 83-110). New York: Springer-Verlag. Schunk, D. H. (1995, April). Self-monitoring and skill acquisition through self-evaluation of capabilities. Paper presented at the Annual Meeting of the American Educational Research Association, San Francisco, CA. 348

PAGE 364

Schunk, D. H. (1996, October). Self-evaluation and self-regulated learning. Paper presented at the Graduate School and University Center, City University of New York, New York, NY. Schunk, D. H., & Gunn, T. P. (1986). Self-efficacy and skill development: Influence of task strategies and attributions. Journal of Educational Research, 79, 238-244. Seegers, G., & Boekaerts, M. (1996). Gender-related differences in self-referenced cognitions in relation to mathematics. Journal for Research in Mathematics Education, 27, 215-240. Shepard, R. G. (1993). Writing for conceptual development in mathematics. Journal of Mathematical Behavior, 12, 287-293. Smith, D. (1973). Yes, American schools are feminized. Phi Delta Kappan, 54, 703-704. Sovik, N., Frostrad, P., & Heggberget, M. (1999). The relation between reading comprehension and task-specific strategies used in arithmetical word problems. Scandinavian Journal of Educational Research, 43, 371-398. Sovik, N., Frostad, P. & Lie, A. (1994). Can discrepancies between IQ and basic skills be explained by learning strategies? British Journal of Educational Psychology, 64, 389-405. Spivey, N. N., & King, J. R. (1994). Readers as writers composing from sources. In R. B. Ruddell, M. R. Ruddell, and H. Singer (Eds.), Theoretical models and processes of reading (pp. 668-694). Newark, DE: International Reading Association. 349

PAGE 365

State of Florida Department of Education (1996). Sunshine State Standards: Mathematics. Retrieved July 16, 2004, from http://www.firn.edu/doe/curric/prek12/pdf/math3.pdf State of Florida Department of Education (2001). FCAT performance task scoring practice for educators: Grade 5 mathematics. Florida Department of Education, Tallahassee, FL. State of Florida Department of Education (2001). Florida Comprehensive Assessment Test, Grade 5 Mathematics, Released Items. Retrieved January 16, 2004 from http://www.firn.edu/doe/sas/fcat/pdf/fc5mib1r.pdf. State of Florida Department of Education (2004). Understanding FCAT reports 2004. Retrieved May 31, 2004 from http://www.firn.edu/doe/sas/fcat/pdf/fc_ufr2004.pdf State of Florida Department of Education (2005). Evaluation and Reporting. Retrieved May 14, 2005 from http://www.firn.edu/doe/evaluation/home0018.htm. Stevens, C. A., & Zaharias, J. A. (1997). Problem-solution relationship instruction: A method for enhancing students comprehension of word problems. Focus on Learning Problems in Mathematics, 19, 14-42. Stevens, J. (1986). Applied multivariate statistics for the social sciences. Hillsdale, NJ: Lawrence Erlbaum Associates. Stevens, J. (1999). Intermediate statistics (2 nd ed.). Mahwah, NJ: Lawrence Erlbaum Associates. 350

PAGE 366

Stipek, D. J., & Gralinksi, J. H. (1991). Gender differences in childrens achievement-related beliefs and emotional responses to success and failure in mathematics. Journal of Educational Psychology, 83(3), 361-371. Tartre, L. A., & Fennema, E. (1995). Mathematics achievement and gender: A longitudinal study of selected cognitive and affective variables. Educational Studies in Mathematics, 28, 199-217. Thompson, D. R., & Rubenstein, R. N. (2000). Learning mathematics vocabulary: Potential pitfalls and instructional strategies. Mathematics Teacher, 93, 568-574. Thompson, G. W., Strackbein, D., & Williams, J. D. (1992). Gender differences in an experimental program on arithmetic problem solving and computation. Mid-Western Educational Researcher, 5, 20-23. Thorndike, E. G. (1917). Reading as reasoning: A study of mistakes in paragraph reading. Journal of Educational Psychology, 8, 323-332. Tierney, R. J., & Mosenthal, J. (1985). Discourse comprehension and production: Analyzing text structure and cohesion. In H. Singer and R. B. Ruddell (Eds.), Theoretical models and processes of reading (pp. 55-104). Newark, DE: International Reading Association. Tindal, G., Heath, B., Hollenbeck, K., Almond, P., & Harniss, M. (1998). Accommodating students with disabilities on large-scale tests: An empirical study of student response and test administration demands. Exceptional Children, 64, 439-450. 351

PAGE 367

Tovani, C. (2000). I read it, but I dont get it: Comprehension strategies for adolescent readers. Portland, ME: Stenhouse Publishers. Usiskin, Z. (1996). Mathematics as a language. In P. S. Elliott & M. J. Kenney (Eds.), Communications in mathematics, K-12 and beyond: 1996 NCTM yearbook, (pp. 231-243). Reston, VA: National Council of Teachers of Mathematics. Van de Walle, J. A. (2001). Elementary school mathematics. Boston, MA: Pearson Custom Publishing. Van Dijk, T. A., & Kintsch, W. (1977). Cognitive psychology and discourse: Recalling and summarizing stories. In W. U. Dressler (Ed.), Current trends in textlinguistics (pp. 794-812). New York: De Gruyter. van Kraayenoord, C. E., & Paris, S. G. (1997). Australian students self-appraisal of their work samples and academic progress. The Elementary School Journal, 97, 523-537. Vermeer, H. J., Boekaerts, M., & Seegers, G. (2000). Motivational and gender differences: Sixth-grade students mathematical problem-solving behavior. Journal of Educational Psychology, 93(2), 308-315. Vermeer, H., Seegers, G., & Boekaerts, M. (1994). Solving math problems: Degree of association between confidence ratings and effective strategy use. Paper presented at the Annual Conference of the Dutch Educational Research Association, Amsterdam. 352

PAGE 368

Vogel, S. (1990). Gender differences in intelligence, language, visual-motor abilities, and academic achievement in students with learning disabilities: A review of the literature. Journal of Learning Disabilities, 23, 44-52. Vygotsky, L. S. (1987). Thinking and speech. In A. S. Carton (Ed.), The collected works of L. S. Vygotsky (p. 39-243). New York: Plenum Press. Wagemaker, H. (1993). Reading achievement of New Zealand 9and 14-year-olds. In H. Wagemaker (Ed.), Achievement in reading literacy: New Zealands performance in a national and international context (pp. 33-58). Wellington, New Zealand: Ministry of Education. Wakefield, D. V. (2000). Math as a second language. The Educational Forum, 64, 272-279. Walkerdine, V. (1998). Counting girls out: Girls and mathematics. London: Falmer Press. Walsh, M., Hickey, C., & Duffy, J. (1999). Influence of item content and stereotype situation on gender differences in mathematical problem solving. Sex Roles, 41, 219-239. Wardrop, H. (1993). Mathematics language problems. The Australian Mathematics Teacher, 49, 10-13. Webster, A., & Henriksson, W. (2000). The interaction between item format and gender differences in mathematics performance based on TIMSS data. Studies in Educational Evaluation, 26, 79-90. 353

PAGE 369

Wilkinson, I. A. G. (1998). Dealing with diversity: Achievement gaps in reading literacy among New Zealand students. Reading Research Quarterly, 33, 144-167. Williams, J. E. (1994). Gender differences in high school students efficacy-expectation/performance discrepancies across four subject matter domains. Psychology in the Schools, 31, 232-237. Williams, R. L., & Yore, L. D. (1985). Content, format, gender, and grade level differences in elementary students ability to read science materials as measured by the cloze procedure. Journal of Research in Science Teaching, 22, 81-88. Willingham, W. W., & Cole, N. S. (1997). Gender and fair assessment. Mahwah, NJ: Lawrence Erlbaum. Wilson, L. D., & Zhang, L. (1998, April). A cognitive analysis of gender differences on constructed-response and multiple-choice assessments in mathematics. Paper presented at the Annual Meeting of the American Educational Research Association, San Diego, CA. Winne, P. H. (1995). Inherent details in self-regulated learning. Educational Psychologist, 30, 173-187. Winne, P. H. (1996). A metacognitive view of individual differences in self-regulated learning. Learning and Individual Differences, 8, 327-353. Wong, R. M., Lawson, M. J., & Reeves, J. (2002). The effects of self-explanation training on students problem solving in high-school mathematics. Learning and Instruction, 12, 233-262. 354

PAGE 370

Woodward, J., Monroe, K., & Baxter, J. (2001). Enhancing student achievement on performance assessments in mathematics. Learning Disabilities Quarterly, 24, 33-46. Wright, C. R., & Houck, J. W. (1995). Gender differences among self-assessments, teacher ratings, grades, and aptitude test scores for a sample of students attending rural secondary schools. Educational and Psychological Measurement, 55, 743-752. Yates, J. F. (1990). Judgment and decision making. Englewood Cliffs, NJ: Prentice Hall. Young, J. P., & Brozo, W. G. (2001). Boys will be boys, or will they? Literacy and masculinities. Reading Research Quarterly, 36, 316-325. Zambo, R. W. (1990). An investigation of possible gender-related differences in the process of solving arithmetic word problems in the sixth and eighth grades. Unpublished doctoral dissertation, University of South Florida, Tampa. Zambo, R., & Follman, J. (1994). Gender-related differences in problem solving at the 6 th and 8 th grade levels. Focus on Learning Problems in Mathematics, 16, 20-38. Zambo, R., & Hess, R. K. (1996). The gender differential effects of a procedural plan for solving mathematical word problems. School Science and Mathematics, 96, 362-370. 355

PAGE 371

Zhang, L., Wilson, L., & Manon, J. (1999, April). An analysis of gender differences on performance assessment in mathematics a follow-up study. Paper presented at the Annual Meeting of the American Educational Research Association, Montreal, Quebec, Canada. Zimmerman, B. J. (1989). A social cognitive view of self-regulated academic learning. Journal of Educational Psychology, 81, 329-339. Zimmerman, B. J. (1990). Self-regulated learning and academic achievement. Educational Psychologist, 25, 3-17. Zimmerman, B. J., & Martinez-Pons, M. (1990). Student differences in self-regulated learning: Relating grade, sex, and giftedness to self-efficacy and strategy use. Journal of Educational Psychology, 82, 51-59. 356

PAGE 372

APPENDIX A FCAT MATHEMATICS PERFORMANCE SCORING RUBRIC 357

PAGE 373

FCAT MATHEMATICS PERFORMANCE SCORING RUBRIC 4 Points: A score of four is a response in which the student demonstrates a thorough understanding of the mathematics concepts and/or procedures embodied in the task. The student has responded correctly to the task, used mathematically sound procedures, and provided clear and complete explanations and interpretations. The response may contain minor flaws that do not detract from the demonstration of a thorough understanding. 3 Points: A score of three is a response in which the student demonstrates an understanding of the mathematics concepts and/or procedures embodied in the task. The students response to the task is essentially correct with the mathematical procedure used and the explanations and interpretations provided demonstrate an essential, but less than thorough, understanding. The response may contain minor flaws that reflect inattentive execution of mathematical procedures or indications of some misunderstanding of the underlying mathematics concepts and/or procedures. 2 Points: A score of two indicates that the student has demonstrated only a partial understanding of the mathematics concepts and/or procedures embodied in the task. Although the student may have used the correct approach to obtaining a solution or may have provided a correct solution, the students work lacks the essential understanding of the underlying mathematical concept. The response contains errors related to misunderstanding important aspects of the task, misuse of mathematical procedures, or faulty interpretations of results. 1 Point: A score of one indicates that the student has demonstrated a very limited understanding of the mathematics concepts and/or procedures embodied in the task. The students response is incomplete and exhibits many flaws. Although the students response has addressed some of the conditions of the task, the student reached an inadequate conclusion and/or provided reasoning that was faulty or incomplete. The response exhibits many flaws or may be incomplete. 0 Points: A score of zero indicates that the student has provided no response at all, or a completely incorrect or uninterpretable response, or demonstrated insufficient understanding of the mathematics concepts and/or procedures embodied in the task. For example, a student may provide some work that is mathematically correct, but the work does not demonstrate even a rudimentary understanding of the primary focus of the task. Source: State of Florida Department of Education (2004), Understanding FCAT Reports 2004. 358

PAGE 374

APPENDIX B AMENDED RUBRIC FOR SCORING LINGUISTIC EXPLANATIONS 359

PAGE 375

AMENDED RUBRIC FOR SCORING LINGUISTIC EXPLANATIONS For all scoring, symbols included in a written explanation are regarded as text. 3 Points: A score of three indicates that the student has provided a thorough written explanation of the problem solving process that conveys conceptual understanding and includes the context of the problem. The student has explained the strategies and/or steps used to solve the problem in an organized, clear, and complete manner. The writing may contain minor flaws that do not detract from the overall quality of the explanation. 2 Points: A score of two indicates that the student has provided only a partial written explanation of the problem solving process that conveys some conceptual understanding. The student has explained some of the strategies used to solve the problem, but has omitted one or more important steps; the student has explained all of the computation, but has failed to include the problem context in the explanation; or the student has provided a reasonably clear explanation of seriously flawed mathematics. The writing may contain flaws that reflect inattention to detail or faulty interpretation of the process. 1 Point: A score of one indicates that the student has provided only a limited amount of accurate written text to explain the problem solving process or the operation used. This text may take the form of labels, descriptors, or minor explanations, but fails to convey the essence of the process. The writing may contain flaws. 0 Points: A score of zero indicates that the student has not provided any written text to explain the problem solving process, or has provided text that is incorrect and/or fails to relate to the problem. Restating the answer in words does not constitute text to explain the process. 360

PAGE 376

APPENDIX C PILOT STUDY RUBRIC FOR SCORING LINGUISTIC EXPLANATIONS 361

PAGE 377

PILOT STUDY RUBRIC FOR SCORING LINGUISTIC EXPLANATIONS 3 Points: The student provided a linguistic explanation that effectively summarized all steps of the problem in a logical manner. 2 Points: Either the student provided a logical explanation that failed to address one important step of the problem, or the student provided an explanation that did address all steps of the problem but was not entirely logical. 1 Point: Either the student provided effective labels and descriptors but no explanation, provided minimal explanation that failed to address most of the steps of the problem, or provided explanation that was not logical or did not support computation work or solution. 0 Points: The student did not provide any text to support the computational work. 362

PAGE 378

APPENDIX D SAMPLE STUDENT WORK FROM PILOT STUDY TO ILLUSTRATE SCORING 363

PAGE 379

SAMPLE STUDENT WORK FROM PILOT STUDY TO ILLUSTRATE SCORING All samples are taken from the Jeans class problem which follows: In Jeans class, there are twice as many boys as girls. If there are 10 girls in the class, how many boys and girls are there in the class? Show your work, then use words or pictures to explain how you got your answer. The following response scored a 3 for performance and a 0 for linguistic explanation. The student clearly understood the problem situation and arrived at a correct answer, but the response could not be given a score of 4 since it was lacking an explanation. The total lack of writing resulted in the 0 score for linguistic explanation. The following response earned a performance score of 4 because the student demonstrated a thorough understanding of the mathematics concepts embodied in the task, provided a correct response, and provided some text to explain the procedures used. The response scored a 1 for linguistic explanation because the student provided only a limited amount of text in the form of labels and minor descriptors. 364

PAGE 380

Appendix D (Continued) 365 The following response earned a performance score of 2. Although the student used a correct approach to determining the solution, the work contains an error related to misunderstanding an important aspect of the task, namely finding the total number of boys and girls. The linguistic explanation earned a score of 2 because the student explained one of the major strategies used to solve the problem, but omitted an important step.

PAGE 381

APPENDIX E FCAT ACHIEVEMENT LEVELS 366

PAGE 382

FCAT ACHIEVEMENT LEVELS Achievement levels describe the success a student has achieved on the Florida Sunshine State Standards tested on the FCAT. Achievement levels range from 1 to 5, with Level #1 being the lowest and Level #5 being the highest. Level 5 This student has success with the most challenging content of the Sunshine State Standards. A student scoring in Level 5 answers most of the test questions correctly, including the most challenging questions. Level 4 This student has success with the challenging content of the Sunshine State Standards. A student scoring in Level 4 answers most of the test questions correctly, but may have only some success with questions that reflect the most challenging content. Level 3 This student has partial success with the challenging content of the Sunshine State Standards, but performance is inconsistent. A student scoring in Level 3 answers many of the test questions correctly but is generally less successful with questions that are the most challenging. Level 2 This student has limited success with the challenging content of the Sunshine State Standards. Level 1 This student has little success with the challenging content of the Sunshine State Standards. Source: State of Florida Department of Education (2004), Understanding FCAT Reports 2004. 367

PAGE 383

APPENDIX F PILOT STUDY 368

PAGE 384

PILOT STUDY Purpose The purpose of the pilot study was primarily to ensure that the testing instrument, testing procedures, and scoring/coding procedures would produce the data necessary to answer the five research questions. Additional purposes for the pilot study included ensuring that the process of obtaining informed consent would result in a sufficient sample size, that the ability grouping process would accurately and efficiently categorize the sample students into groups, and that the follow-up interviews with selected students would serve as a member check to verify students problem solving processes. Goals for Piloting the Test Instrument, Test Procedures, and Scoring/Coding Procedures Test Instrument Regarding the test instrument, the researcher wished to use information garnered from the small-scale pilot administration of the test to determine the following: 1) Appropriateness of the length of the12-item constructed response test, 2) Appropriateness of the range of difficulty of the individual test items, 3) Appropriateness of the order of the test items, 4) Quality and quantity of the written work provided by students, 5) Diversity of responses that the test items would generate, 369

PAGE 385

Appendix F (Continued) 6) Ability of the self-efficacy and self-assessment scales to capture students thinking during the testing event. A maximum test length of one hour was deemed to be appropriate for upper-elementary students given that mathematics lessons are generally one hour in length and recognizing that a test in excess of one hour may result in student fatigue and lead to spurious results. Regarding item difficulty, the goal was to include some items that would likely be solved correctly by most students, some which would be correctly solved by some students, and some items that would only be successfully solved by the highest-level students. The easier items would give all students a chance to experience some success, and the more difficult items would allow for the scores to differentiate between students. The test items were ordered beginning with some of the easier items to encourage students to become engaged in the test. The remaining items were spaced to allow for a variety of mathematical and reading difficulty, content, and context throughout the test. Because one of the research questions involves gender differences in childrens problem solving processes, and another involves the quality of students linguistic explanation, the test items must elicit enough written work and explanation from the students for the researcher to be able to draw conclusions to answer these two questions. In order to differentiate between students problem solving processes, the test items must elicit a diverse set of responses that are clearly differentiated. The first test item, the flower pot problem, was 370

PAGE 386

Appendix F (Continued) chosen knowing that it was below grade level in difficulty and would not likely result in a wide variety of student responses. Its primary purpose was to provide students with an entrance point into the test. Unlike the other test items, its value will not be judged based on the diversity of student responses. Finally, the self-efficacy and self-assessment scales must be readily understood by students and effectively integrated into their testing experience. Students must be able to differentiate between the faces and be able to select the face that correctly represents their feelings at the appropriate time. Testing Procedures The desired goals for piloting the testing procedures included the following: 1) Assess whether using a sample test item to introduce the test format, self-efficacy scale, and self-assessment scale will sufficiently prepare students for the test administration, 2) Determine what testing location (in the classroom or in a separate room) would best serve the test administration, Scoring/Coding Procedures The goals for piloting the scoring procedures include the following: 1) Practice using the FCAT holistic four-point rubric for scoring test items, 2) Use the rubric scoring process to identify anchor papers: responses to each test item that exemplify the quality of an answer earning a score of 4, 3, 2, 1, or 0. 371

PAGE 387

Appendix F (Continued) 3) Use the researcher-created three-point scoring rubric for assessing the quality of students linguistic explanations to determine the extent to which the rubric promotes both convergence of responses within a score and divergence of responses between scores, 4) Practice coding students problem solving processes to determine the extent to which responses can be classified into meaningful categories. Design The intent of the study is to answer the five research questions using a non-experimental causal-comparative design with both qualitative and quantitative methods. The three independent variables are gender (2 levels), reading ability (2 levels: competent or low), and mathematics ability (2 levels, competent or low). The five dependent variables for each participant include their test score as assessed with the FCAT four-point rubric, the overall score for quality of linguistic explanations as assessed with a researcher-created three-point rubric, a categorical score relating to students choice of process for solving each problem, a measure of self-efficacy for each test item, and a measure of self-assessment for each test item. Population and Sample Selection The population that this pilot study sampled consisted of the students from one fifth-grade and one fourth-grade class at an elementary school in Florida. 372

PAGE 388

Appendix F (Continued) The full study will take place in the first half of the school year with fifth-grade students, while the pilot study was conducted at the end of the school year. The concern with limiting the pilot to fifth-grade students was that because the students available for study at the time of the pilot had benefited from nearly the entire year of fifth-grade mathematics instruction, whereas the students in the full study will have only completed a fourth of the fifth-grade mathematics curriculum, the pilot sample may not be sufficiently representative of the full study sample. For that reason, the decision was made to conduct the pilot with end-of-year students from both the fourthand fifth-grades to obtain a better measure of the responses the full study participants might produce. Parental permission was obtained through informed consent documents approved by the Institutional Review Board of the University of South Florida. Approximately half of the students obtained parental approval, resulting in a total of 10 fourth-grade (7 female and 3 male), and 13 fifth-grade (7 female and 6 male) participants. The two teachers informally evaluated their students reading and mathematics ability by rating them as high, middle, or low, for each of the two subject areas. Prior year FCAT scores for mathematics and reading were also obtained. The teacher ratings were assigned a numerical value of high=3, middle=2, and low=1. The FCAT scores were also assigned a numerical value of 3 (high) for a score of 5 or 4, the score of 2 (middle) for an FCAT score of 3, and a score of 1 (low) for an FCAT score of 2 or 1. Each students two scores (teacher rating and FCAT) were averaged to obtain a categorization of either high, middle, or low for the 373

PAGE 389

Appendix F (Continued) purposes of this study. Students whose averages resulted in either a 1.5 or 2.5 score were categorized based on teacher rating. The result of the categorization system was a total of 18 groups (2 levels of gender x 3 levels of reading ability x 3 levels of mathematics ability). Table 38 below shows the number of students in each group. Table 38 Categorization of Pilot Students Mathematics Ability Reading Ability High Middle Low High Males-4 Males-0 Males-0 Females-6 Females-2 Females-0 Middle Males-0 Males-4 Males-0 Females-2 Females-2 Females-1 Low Males-0 Males-0 Males-1 Females-0 Females-0 Females-1 Instrumentation The test to be administered to the participants includes 12 constructed-response mathematics problem-solving items. A copy of the test is provided in Appendix H. A detailed description of the test, its development, and its validity is included in Chapter 3 Methods. 374

PAGE 390

Appendix F (Continued) Reliability The reliability of the pilot test was assessed through a measure of internal consistency. The Cronbach alpha coefficient for mathematical performance scores on the test was .89, a number considered satisfactory following the guidelines of Guilford and Fruchter (1978). The Cronbach alpha for linguistic explanations was slightly lower at .75. The Cronbach alpha coefficients for self-efficacy and self-assessment were lower, .55 and .68 respectively, primarily due to missing data. Data Collection In order to fulfill the goals of the pilot study, the study test was first administered to a group of fourth-grade students (n=10, 7 female and 3 male). One week following that administration, the test was given to a group of fifth-grade students (n=13, 7 female and 6 male). The purpose of the delay was to allow time for any changes in the test instruments length or level of difficulty deemed necessary after the first administration. The student work for each question was evaluated based on the FCAT four-point rubric (see Appendix A for a copy of the scoring rubric), and the quality of the linguistic explanation based on a researcher-developed three-point rubric (see Appendix B for a copy of the scoring rubric). Students indications of self-efficacy and self-assessment on the four-face scale were converted to a numerical value ranging from one to four. Participation in the study was voluntary, and parental permission through informed consent was obtained in advance for each participant (see Appendix H 375

PAGE 391

Appendix F (Continued) for consent forms). Informed consent was obtained through a two-step process. First, a brief letter from the principal describing the study was sent home with all students in one fourth-grade class and one fifth-grade class (see Appendix I for a copy of the letter). At the bottom of the letter was a place for a parent to indicate no if they were not willing for their child to participate, or yes if they were interested in having their child participate and would like a copy of the informed consent documents sent home. For those parents who indicated yes, a packet containing a cover letter from the researcher and two copies of the informed consent document were sent home. All fourth-graders whose parents initially said yes returned a signed copy of the informed consent. All but one fifth-grade student whose parents said yes returned a signed copy of the informed consent. Organization of Testing The fourth-grade administration of the test was conducted in the regular classroom while non-participating students wrote in their journals or read silently. Participants sat at round tables with up to four students at each table. The fifth-grade administration of the test was conducted in a multi-purpose room with students sitting at individual desks. Before the test began, each student was given a copy of the test booklet and asked to enter their name, teacher name, school, and gender on the cover page. When all students were finished, they were instructed to turn to the first question of the test, a practice item, and were asked to read the question silently. Next, the researcher explained the format of the test page, indicating that the remaining twelve test items were formatted in 376

PAGE 392

Appendix F (Continued) the same fashion. The instructions for the self-efficacy scale were read aloud by the researcher and explained. The students were then lead in an open discussion about what circumstances would result in them choosing each of the four faces. Students were encouraged to ask questions if they did not understand. Next, the instructions for the self-assessment scale were read aloud by the researcher and explained. The students were then lead in an open discussion of what level of performance would result in them choosing each of the four faces. Students were instructed to show all their work on all questions, and turn in their test booklet to the researcher when they were finished. No time limit was discussed or imposed. The only questions the researcher answered during the test administration were those pertaining to the test format, or scoring of the self-efficacy or self-assessment scales. As each student completed the test, the total testing time was recorded on his/her test booklet. Analysis of Data Upon data collection from the fourth-grade students, the researcher performed a cursory review of the students work and determined that while one student was highly unsuccessful, most students were able to attempt most of the test items, indicating an appropriate level of difficulty. In addition, since all students completed the test in less than 45 minutes, the test length was confirmed as being acceptable. For these reasons, no adjustments were made to the test before the fifth-grade administration. 377

PAGE 393

Appendix F (Continued) After the test was administered, each student response was first evaluated using the FCAT four-point rubric. Student responses that were indicative of each possible score were summarized and recorded to ensure consistency in scoring. The linguistic explanations were then scored using a three-point rubric. Students indications of self-efficacy and self-assessment were converted to numerical scores and all scores were entered at the top of the page for each question. Descriptive statistics were calculated to summarize the data. The researcher then reviewed the mean scores for performance to determine the appropriateness of the order of the questions. The biggest concern was whether students would continue to expend effort after dealing unsuccessfully with a difficult question. When all scoring was completed, all student responses for the first question were sorted according to the process used to solve the problem. Categories were continually reviewed to ensure both convergence and divergence. Each process category was summarized and assigned a numerical value that was entered at the top of each test page, and frequencies for each process category were tallied. No judgments were made as to the superiority of one process over another. This process was repeated for each of the twelve questions. Upon completion of preliminary data analysis, four students (2 male and 2 female) were selected for follow-up interviews. Because the primary purpose of the interview was to ensure that the researcher had correctly interpreted the 378

PAGE 394

Appendix F (Continued) students work when categorizing the processes, the most critical students to interview were those for whom there existed the greatest possibility of categorization error. For this reason, the students selected for interview were those who scored at least an average of two points on the FCAT four-point rubric but scored the lowest for linguistic explanation. The one boy and one girl in each of the fourth-grade and fifth-grade group who best fit these criteria were selected for interview. One additional boy and girl from each grade were also selected in case the first-choice student was absent on the day of the interview. Student interview data were reviewed to determine the extent to which the researcher accurately categorized the students work, and the extent to which the student explanations for selection of faces for self-efficacy and self-assessment resulted in both convergence among and divergence between the ratings. Follow-Up Interviews Interviews were conducted with four students, one girl and one boy each from fourth and fifth grade. The goals for the interviews were to verify the researchers interpretation of the process used to solve each problem, and ensure that the students had correctly interpreted the four-face scales for self-efficacy and self-assessment. All interviews were videotaped and later transcribed. The interviews were held privately in the schools conference room with only the student and the researcher present. 379

PAGE 395

Appendix F (Continued) The interview began with the researcher explaining the purpose of the interview and the need for the video camera. Students were asked to give their consent before the actual interview began, and all students consented. During the interview, the student was presented with his/her own test paper and asked to read the first test question aloud. The student was asked why he/she circled the face he/she circled for the self-efficacy measure. The researcher then asked the student to take as much time as necessary to re-familiarize him/herself with his/her work, then to explain to the researcher what the student did. To conclude each question, the researcher asked the child to explain why he/she chose the particular face for the self-assessment scale. All twelve questions were handled in the same fashion, resulting in an interview of 35-40 minutes in duration. Transcription from video tape was a cumbersome process, but the video images provided helpful information that aided the researcher in recognizing what specific part of the students work was being discussed. For the full study, the interviews will be both audiotaped and videotaped. The audiotape will be used to produce a transcript of dialogue. The videotape will be used to supplement the transcript with additional descriptions of the action taking place during the interview. 380

PAGE 396

Appendix F (Continued) Results Results Involving the Test Instrument and Pilot Procedures Test Length, Item Difficulty, and Item Placement The mean amount of time students required to take the test was 33 minutes, with a range of 16 minutes to 56 minutes. This length was considered appropriate, confirming the use of twelve items for the test. The mean performance score for all student responses for all twelve questions was 2.54 on a four-point scale with a standard deviation of 1.24. Table 39 presents the mean and standard deviation of the scores for each of the twelve questions, in their order of appearance on the test. The flower pot problem was thought to be the easiest problem on the test based on NAEP data, and it proved to be the highest scoring item with the lowest standard deviation on the pilot test. Students had the most difficulty with the string problem involving division of fractions. Surprisingly, according to NAEP data this was only of mid-range difficulty. The range of mean scores (1.39 to 3.35) was deemed acceptable for the purposes of this study. In addition, students performed relatively well (M=2.70) on the balloon problem that immediately followed the most difficult problem. This indicates that students were not overly discouraged by the difficult problem and were able to continue to give a strong effort; hence, its placement was appropriate. The birdseed problem proved to be the second most difficult problem on the test (M=1.96), and was 381

PAGE 397

Appendix F (Continued) Table 39 Pilot Study Summary of Performance Rubric Scores by Item Test Question M SD Flower pot 3.35 0.57 Ms. Hernandez 2.87 1.14 Calories 2.30 1.06 Jeans class 2.61 1.08 Pizza 2.48 1.24 String 1.39 1.20 Balloons 2.70 1.29 Flour 2.65 1.30 Calculator 2.70 1.40 School lunch 2.78 1.40 Pencils 2.65 1.30 Birdseed 1.96 1.25 Note. n=23, the maximum possible score for each item was 4.0 placed at the end of the test following an item on which students performed relatively well (M=2.65). This indicates that fatigue was not likely a factor in the score of this item, and confirms its placement. Students Written Work A vast majority of the students showed their work for each problem, making categorization of students problem solving processes possible. The 382

PAGE 398

Appendix F (Continued) diversity of student processes varied according to problem, with a low of four discernable problem-solving paths for the flower pot problem, and a high of ten different paths for solving the balloon problem. This level of diversity will allow for analysis of gender differences in the full study, though the greater sample size will likely result in a greater number of problem-solving processes and necessitate the combination of similar processes. The follow-up interviews confirmed the researchers categorization of students problem-solving processes. Self-Efficacy and Self-Assessment Scales The follow-up interviews were the only true source of information as to whether students understood and correctly employed the four-face scales for self-efficacy and self-assessment. Transcripts from the four interviews were reviewed, and students explanations for choosing each of the four faces were copied and pasted on a list according to face. Student explanations for choosing the happiest face were fairly consistent (convergent) and strongly differentiated from explanations for other faces (divergent). The same was true for the saddest of the four faces. The researcher discovered a problem with the middle two faces in that students comments for each of the faces were similar. Students viewed both the slightly-happy and slightly-sad faces as representing partial success and partial failure. The decision to use only four faces was originally made to avoid students potential over-selection of a neutral, middle face. As a result of the pilot study, and in keeping with other similar research (Boekarts et 383

PAGE 399

Appendix F (Continued) al., 1995; Vermeer et al., 2000), the decision was made to include a fifth, neutral face in the self-efficacy and self-assessment scales for the full study. Testing Procedures For both test administrations, the students appeared to grasp the test format and the purpose and procedure for the self-efficacy and self-assessment scales. Six students neglected to mark one or more self-efficacy or self-assessment scores resulting in 17 missing scores out of a total of 552 scores (3.08%). Nine of the seventeen missing scores were from one student, the same student who was identified as an outlier during the data analysis. If this students scores were eliminated from the analysis, the percentage of missing scores would drop to 1.5%. In an effort to reduce this percentage for the full study, the researcher will remind students during the test administration to be sure to mark both scales for each question. Regarding test location, although both pilot locations were acceptable, the quiet and lack of distractions experienced in the multi-purpose room provided an atmosphere more conducive to student concentration. When possible, students will be tested in a separate room for the full study. Scoring Procedures The FCAT four-point rubric proved to be an efficient scoring system that established clear categories for each score value. Qualities of answers given each score were recorded for use in scoring test items in the full study. Use of 384

PAGE 400

Appendix F (Continued) the researcher-created scoring rubric for linguistic explanations resulted in a lack of clarity within each score category. The researcher made anecdotal notes throughout the scoring process of difficulties encountered with the rubric wording. The notes were then used to produce the amended version of the scoring rubric found in Appendix B. The coding process for categorizing students problem solving processes produced meaningful categories. Informed Consent Process The process used for the pilot study to obtain parental permission was less than satisfactory. Only half of the potential participants obtained parental permission, a number that may not yield a large enough sample to detect any differences that may exist between groups. For this reason, the researcher has amended the process for obtaining parental permission for the full study. To begin, most of the parents who responded to the principals introductory letter indicated their willingness to have their child participate, and nearly all of these parents later signed the informed consent document. However, only slightly more than half of the parents responded to this letter. The researcher concluded that the difficulty was not so much in the parents lack of willingness to have their child participate as in the childrens failure to give the letter to their parents and return the letter to school. For the full study, the introductory letter will be attached to the informed consent documents, reducing the number of times students must be responsible for paperwork. The introductory letter will explain the attached informed consent 385

PAGE 401

Appendix F (Continued) and will also include a place for a parent to sign indicating their lack of willingness to have their child participate. The researcher will provide each teacher with a non-edible incentive (not to exceed 50 cents in value) for every child who returns to school either a signed informed consent document giving parental permission, or a signed introductory letter refusing parental permission. The researcher believes that these procedural changes will result in a higher percentage of participation. Results of the Data Analysis Individual student means for performance ranged from 0.42 to 3.50 on a four-point scale with a standard deviation of 0.80. The lowest score represents an outlier, with the next lowest score mean at 1.25. The student with the lowest score completed the test in 16 minutes, and the teacher indicated that the student had not put forth any effort on the test. As can be seen in Table 40, girls outscored boys on the measure of performance. An item analysis reveals that girls outscored boys on all questions except the birdseed problem, the one that NAEP data shows to be the most difficult. Girls also performed better on the measure of linguistic explanations, but boys and girls measures of their own performance through the self-assessment scale were nearly identical. Girls were slightly more self-efficacious, though boys self-efficacy standard deviation exceeded that of girls. Table 41 is provided as a summary of the pilot results on a per-item basis. The results indicate that students were consistently overconfident on all test 386

PAGE 402

Appendix F (Continued) 387 questions. The mean scores for self-a ssessment were higher than the mean scores for self-efficacy for ten of the twelve questions indicating that students tended to be more confident of their fi nished product than they were of their ability to correctly solve the pr oblem before they began working. Conclusion In summary, the pilot study showed the viability of the test instrument for producing the data necessary to answer t he research questions. Weaknesses revealed in the pilot study that have resu lted in procedural changes for the full study include the rubric for linguistic explanations, the process for obtaining informed consent, the use of only four faces for the self-efficacy and selfassessment scales, the use of three le vels of reading ability and mathematics ability levels, and the use of video alone to record the student interviews. The writing rubric has been amended to clarify each score point. The process for obtaining informed consent has been str eamlined and now includes an incentive for students. The self-efficacy and self -assessment scales now include a fifth neutral face in the middle. Both the reading and mathematics ability grouping have been altered to include only two levels instead of three. Having three levels of ability should still allow for all cells to contain a large enough sample to allow for meaningful analysis. A full description of the revised categorization process is included in Chapter Three.

PAGE 403

Appendix F (Continued) Table 40 Summary of Pilot Data by Gender Performance Linguistic Explanations Self-Efficacy Self-Assessment Gender M SD M SD M SD M SD Girls 2.67 1.21 1.21 1.16 3.46 0.79 3.45 0.87 Boys 2.33 1.26 0.73 1.09 3.35 0.86 3.47 0.86 Note. n=23. The maximum score for performance, self-efficacy, and self-assessment was 4.0. The maximum score for linguistic explanations was 3.0. 388

PAGE 404

Appendix F (Continued) 389 Table 41 Summary of Pilot Data by Item Item Performance Ling. Ex. SE SA Flower pot 3.35 1.04 3.83 3.83 Ms. Hernandez 2.87 0.74 3.61 3.78 Calories 2.30 0.48 3.59 3.18 Jeans class 2.61 0.96 3.83 3.86 Pizza 2.48 2.09 3.62 3.70 String 1.39 0.30 2.73 2.74 Balloons 2.70 0.61 3.36 3.43 Flour 2.65 0.65 3.50 3.52 Calculator 2.70 2.30 3.05 3.14 School lunch 2.78 0.83 3.77 3.82 Pencils 2.65 1.74 3.33 3.65 Birdseed 1.96 0.57 2.72 2.87 Note. n=23; the maximum score for performance, self-efficacy, and self-assessment was 4.0. The maximum score for written explanations was 3.0. SE = self-efficacy, SA = self-assessment, Ling. Ex. = linguistic explanations.

PAGE 405

APPENDIX G TEACHER RATING SHEET 390

PAGE 406

TEACHER RATING SHEET Please indicate these students level of reading and mathematics ability by circling H (high ability), M (average ability), or L (low ability). Base your evaluation strictly on your professional judgment, not test scores. When evaluating reading, focus on the childs level of reading comprehension. When evaluating mathematics, focus on the childs general mathematics ability, including computation and problem solving. SCHOOL ________________________ TEACHER _____________________ STUDENT NAME READING ABILITY MATH ABILITY H M L H M L H M L H M L H M L H M L H M L H M L H M L H M L H M L H M L H M L H M L H M L H M L H M L H M L H M L H M L H M L H M L H M L H M L H M L H M L H M L H M L H M L H M L 391

PAGE 407

APPENDIX H PARENTAL INFORMED CONSENT 392

PAGE 408

Parental Informed Consent Social and Behavioral Sciences University of South Florida Information for People Whose Children Are Being Asked to Take Part in a Research Study The following information is being presented to help you decide whether or not you want to allow your child to be a part of a minimal risk research study. Please read this carefully. If you do not understand anything, ask the person in charge of the study. Title of research study: The Impact of Gender, Reading Ability, and Mathematics Ability on Childrens Mathematical Problem Solving Person in charge of study: Patricia D. Trish Hunsader Where the study will be done: The study will be done at xxx The study will be done during the time your child is normally in school. Your child is being asked to participate because he/she is a fifth-grade student at xxx General Information about the Research Study The purpose of this research study is to find out how boys and girls differ in how they solve mathematical problems, their confidence in their ability to solve problems, their ability to assess their own work. Plan of Study To begin, every childs identity will be kept confidential throughout this study. This fall I will be coming to your childs school to administer a twelve-item math problem-solving test to all of the fifth-grade students whose parents agree to have their children participate. All twelve questions require students to show their work. Before solving each problem, your child will be asked to rate how they feel about the question by circling one of five smiley/sad faces. After they have completed their work on each question, they will rate how well they feel they did on the item by again circling one of five smiley/sad faces. The test should take about one hour to complete. Some time before the test, I will record your childs previous years FCAT scores for math and reading. If your child did not take the FCAT last year, I will record whatever standardized scores are available. Also, your childs teacher will be asked to provide an informal evaluation of your childs reading and math ability. These measures will allow me to look at the relationship between childrens math and reading ability, and their problem-solving performance on my test. 393

PAGE 409

Appendix H (C ontinued) After I have had a chance to review all the childrens work, I will be selecting 8-12 students to interview based on their pattern of answers on the test. The purpose of the interviews is to better understand what the children were thinking as they completed the test items. Each student will be interviewed separately in a private room, and the interview will last approximately 40 minutes. The interviews will take place during school hours at a time determined by the classroom teacher. During the interview, the student will be shown a copy of his/her original work on the test and will be asked to talk through their thinking on each test question. All interviews will be video-taped and audio-taped. The audiotape will be transcribed by a transcriptionist who will not know your childs identity. I will review the videotapes to add details about the interview that cannot be picked up by the audio recording, such as where your child is pointing on the paper, or when your child nods his/her head instead of saying yes or no. Both the video and audio tapes will be stored in a secure location. Payment for Participation You and your child will not be paid for your childs participation in this study. Benefits of Taking Part in this Research Study By taking part in this study, your child will have the benefit of practicing mathematical problem solving and explaining their solutions to problems. The results of this study will add to the knowledge available to educators about mathematical problem solving. Risks of Being a Part of this Research Study There are no known risks associated with participation in this study. Confidentiality of Your Childs Records You and your childs privacy and research records will be kept confidential to the full extent required by law. Authorized research personnel, employees of the Department of Health and Human Services, and the USF Institutional Review Board and its employees, its staff, and any other individuals acting on behalf of USF, may inspect the records from this research project. The results of this study may be published. However, the data obtained from your child will be combined with data from other children in the publication. The published results will not include your childs name or any other information that would personally identify your child in any way. Your child will be assigned a code number before the study begins, and that number, not your childs name, will appear on the testing materials that are viewed by other research personnel. I, as principal investigator, am the only person who will have access to the list that matches your childs name to his/her code number. Fictitious names will be used in any articles written about this study. All information about your child gathered from this study will be kept in my private home in a secured cabinet. The only people who will see this information are the research team and the university faculty who are giving advice on the study. 394

PAGE 410

Appendix H (C ontinued) If your child is selected for an interview, I am the only person who will see the video-tape. The typed record of the interview will refer to your child by his/her code number, not his/her name. The audiotapes will also be labeled with only a code number, and will be reviewed only by me, the transciptionist, and other research personnel. Both tapes will be stored in a secure cabinet at my home. Volunteering to Take Part in this Research Study Your decision to allow your child to participate in this research study is completely voluntary. You are free to allow your child to participate in this research study or to withdraw him/her at any time. If you choose not to allow your child to participate or if you remove your child from the study, there will be no penalty and your childs school grades will not be affected in any way. Your decision to allow your child to participate (or not to participate) in this research study will in no way affect his/her status at school. If your child is selected to be interviewed, he/she will be told at the beginning of the interview that he/she can decide to stop the interview at any time and for any reason and return to the classroom. Questions and Contacts If you have any questions about this research study, contact Patricia Hunsader at 941-xxxxxxxx (Home), 813-974-1034 (USF) 941-xxxxxxxx (cell), or email me at phunsader@aol.com If you have questions about your rights as a person who is taking part in a research study, you may contact the Division of Research Compliance of the University of South Florida at (813) 974-5638. Consent for Child to Take Part in this Research Study I freely give my consent to let my child take part in this study. I understand that this is research. I have received a copy of this consent form. ________________________ ________________________ ___________ Signature of Parent Printed Name of Parent Date of child taking part in study Investigator Statement: I certify that participants have been provided with an informed consent form that has been approved by the University of South Floridas Institutional Review Board and that explains the nature, demands, risks, and benefits involved in participating in this study. I further certify that a phone number has been provided in the event of additional questions. _________________________ _________________________ _____________ Signature of Investigator Printed Name of Investigator Date 395

PAGE 411

Appendix H (C ontinued) Childs Assent Statement Patricia Hunsader has explained to me this research study called The Impact of Gender, Reading Ability, and Mathematics Ability on Childrens Mathematical Problem Solving. I agree to take part in this study. ________________________ ________________________ ___________ Signature of Child Printed Name of Child Date taking part in study ________________________ ________________________ ___________ Signature of Parent Printed Name of Parent Date of child taking part in study ________________________ ________________________ ___________ Signature of person Printed Name of person Date obtaining consent obtaining consent ________________________ ________________________ ___________ Signature of Witness Printed Name of Witness Date 396

PAGE 412

397 APPENDIX I INFORMED CONSENT COVER LETTER

PAGE 413

Date Dear Parent, My name is Patricia Trish Hunsader, and I am a doctoral student at the University of South Florida studying elementary mathematics education. I will be conducting my doctoral dissertation study with the fifth-grade students at XXXX XXXXXX, and XXXX Elementary schools this fall. The purpose of my study is to determine the impact of gender, reading ability, and mathematics ability on childrens mathematical problem solving. The success of this study, and my ability to draw conclusions that will help teachers in the future, depends largely on the number of children who participate. In order for your child to participate, I must obtain your signed permission on the attached Informed Consent document. Two copies of the Informed Consent are attached so that you can sign one and return it to school, and keep one for your records. The Informed Consent contains a detailed explanation of the study, and I encourage you to read it in its entirety. As the Informed Consent explains, the study consists of a one-time twelve item problem-solving test that will take approximately one hour to complete. The test questions are similar in format to the FCAT test, and will provide your child with an opportunity to practice problem solving. A small number of students (a total of 8-12 from the three schools) will be asked to complete a 30-40 minute follow-up interview to talk about their problem-solving processes. Your childs identity will be strictly protected. No individual childrens names, not even the school names, will appear in print in any published material. Also, rest assured that this study is being conducted with the full knowledge and consent of your childs teacher, the school principal, and the University of South Floridas Division of Research Compliance. If you are willing to allow your child to participate, please provide signatures on one copy of the attached Informed Consent where highlighted on pages 3 and 4, then have your child return the form to his/her teacher. The second copy of the Informed Consent is for you to keep. If you are not willing to allow your child to participate, please enter your childs name and your signature in the spaces at the bottom of this page. Warmest Regards, Patricia D. Hunsader No, I am not willing to allow my child to participate in the study. _____________________________ ________________________________ Childs name Parent/Guardian signature 398

PAGE 414

APPENDIX J PROBLEM-SOLVING TEST INSTRUMENT 399

PAGE 415

DO NOT WRITE IN THIS AREA Code: ___ ___ ____ SE: ___ SA: ___ MRS: ___ WRS: ___ PROC: ___ ___ P/D: ___ 400 Appendix J (continued) Name ___________________ Teacher _________________ School __________________ I am a girl _____ boy _____

PAGE 416

DO NOT WRITE IN THIS AREA Code: ___ ___ ____ SE: ___ SA: ___ MRS: ___ WRS: ___ PROC: ___ ___ P/D: ___ 401 Appendix J (continued) Missy and Bob wanted to compare thei r baseball card collections. Missy had 42 baseball cards, and Bob had 31 cards. How many more cards did Missy have in her collection? Show your wo rk, then explain how you know your answer is correct. Work space: Answer: ________ Before you begin working, circle th e face that best shows how well you think you can solve this problem. When you are done working, circle the face that be st shows how well you think you did on this problem.

PAGE 417

DO NOT WRITE IN THIS AREA Code: ___ ___ ____ SE: ___ SA: ___ MRS: ___ WRS: ___ PROC: ___ ___ P/D: ___ 402 Appendix J (continued) The picture shows the flower pots in wh ich Kevin will plant seeds. He needs 3 seeds for each pot. How many seeds will Kevin need for all of the pots? Show your work, then explain how you know your answer is correct. Work space: Number of seeds needed ________ Before you begin working, circle th e face that best shows how well you think you can solve this problem. When you are done working, circle the face that be st shows how well you think you did on this problem. Flowerpot Problem

PAGE 418

DO NOT WRITE IN THIS AREA Code: ___ ___ ____ SE: ___ SA: ___ MRS: ___ WRS: ___ PROC: ___ ___ P/D: ___ 403 Appendix J (continued) Ms. Hernandez formed teams of 8 student s each from the 34 students in her class. She formed as many teams as possible, and the students left over were substitutes. How many students were substitutes? Show your work, then explain how you know your answer is correct. Work space: Answer: _________________ Before you begin working, circle th e face that best shows how well you think you can solve this problem. When you are done working, circle the face that be st shows how well you think you did on this problem. Ms. Hernandez Problem

PAGE 419

DO NOT WRITE IN THIS AREA Code: ___ ___ ____ SE: ___ SA: ___ MRS: ___ WRS: ___ PROC: ___ ___ P/D: ___ 404 Appendix J (continued) Which two of the items above would provid e a total of about 600 calories? Show your work, then explain how you know your answer is correct. Work space: Answer: ___________________ and ___________________ Cheeseburger Hot Dog Yogurt Cookie 393 Calories 298 Calories 214 Calories 119 Calories Before you begin working, circle th e face that best shows how well you think you can solve this problem. When you are done working, circle the face that be st shows how well you think you did on this problem. Calories Problem

PAGE 420

DO NOT WRITE IN THIS AREA Code: ___ ___ ____ SE: ___ SA: ___ MRS: ___ WRS: ___ PROC: ___ ___ P/D: ___ 405 Appendix J (continued) In Jean’s class there are twice as many boys as girls. If there are 10 girls in the class, how many boys and girls are there in the class? Show your work, then explain how you know your answer is correct. Work space: Answer: ________ Before you begin working, circle th e face that best shows how well you think you can solve this problem. When you are done working, circle the face that be st shows how well you think you did on this problem. Jean’s Class Problem

PAGE 421

DO NOT WRITE IN THIS AREA Code: ___ ___ ____ SE: ___ SA: ___ MRS: ___ WRS: ___ PROC: ___ ___ P/D: ___ 406 Appendix J (continued) Think carefully about the following questi on. Write a complete answer. You may use drawings, words, and numbers to explain your answer. Be sure to show all of your work. Jose ate of a pizza. Ella ate of another pizza. Jose said that he ate more pizza than Ella but Ella said they both ate the same amount. Show how Jose could be righ t, then explain how you know your answer is correct. Work space: Before you begin working, circle th e face that best shows how well you think you can solve this problem. When you are done working, circle the face that be st shows how well you think you did on this problem. Pizza Problem

PAGE 422

DO NOT WRITE IN THIS AREA Code: ___ ___ ____ SE: ___ SA: ___ MRS: ___ WRS: ___ PROC: ___ ___ P/D: ___ 407 Appendix J (continued) Jim has 3/4 of a yard of string which he wishes to divide into pieces, each 1/8 of a yard long. How many pieces will he hav e? Show your work, then explain how you know your answer is correct. Work space: Answer: __________________ Before you begin working, circle th e face that best shows how well you think you can solve this problem. When you are done working, circle the face that be st shows how well you think you did on this problem. String Problem

PAGE 423

DO NOT WRITE IN THIS AREA Code: ___ ___ ____ SE: ___ SA: ___ MRS: ___ WRS: ___ PROC: ___ ___ P/D: ___ 408 Appendix J (continued) To solve this problem you must ESTIMATE Do NOT find the exact answer. This arch of balloons will be placed at the finish line for a local Walk-A-Thon. The section marked on the picture has about 32 balloons. Based on this information, ESTIMATE the total number of balloons in the arch. Show your work, then explain how you know your estimate is correct. Work space: Estimate _______________ balloons Before you begin working, circle th e face that best shows how well you think you can solve this problem. When you are done working, circle the face that be st shows how well you think you did on this problem. Balloon Problem

PAGE 424

DO NOT WRITE IN THIS AREA Code: ___ ___ ____ SE: ___ SA: ___ MRS: ___ WRS: ___ PROC: ___ ___ P/D: ___ 409 Appendix J (continued) If 1 1/3 cups of flour are needed for a batch of cookies, how many cups of flour will be needed for 3 batches? Show your work, then explain how you know your answer is correct. Work space: Answer: __________________ Before you begin working, circle th e face that best shows how well you think you can solve this problem. When you are done working, circle the face that be st shows how well you think you did on this problem. Flour Problem

PAGE 425

DO NOT WRITE IN THIS AREA Code: ___ ___ ____ SE: ___ SA: ___ MRS: ___ WRS: ___ PROC: ___ ___ P/D: ___ 410 Appendix J (continued) Think carefully about the following questi on. Write a complete answer. You may use drawings, words, and numbers to explain your answer. Be sure to show all of your work. Laura wanted to enter the number 8375 into her calculator. By mistake, she entered the number 8275. Without clear ing the calculator, how could she correct her mistake? Show your work, then explain how you know your answer is correct. Without clearing the calculat or, how could she correct her mistake another way? Show your work, then explain how you know your answer is correct. First way: Second way: Before you begin working, circle th e face that best shows how well you think you can solve this problem. When you are done working, circle the face that be st shows how well you think you did on this problem. Calculator Problem

PAGE 426

DO NOT WRITE IN THIS AREA Code: ___ ___ ____ SE: ___ SA: ___ MRS: ___ WRS: ___ PROC: ___ ___ P/D: ___ 411 Appendix J (continued) Sam can purchase his lunch at school. Each day he wants to have juice that costs 50¢, a sandwich that costs 90¢, and fruit that co sts 35¢. His mother has only $1.00 bills. What is the least number of $1.00 bills that his mother should give him so he will have enough money to buy lunch for 5 days? Show your work, then explain how you know your answer is correct. Work space: Answer: _________________ Before you begin working, circle th e face that best shows how well you think you can solve this problem. When you are done working, circle the face that be st shows how well you think you did on this problem. School Lunch Problem

PAGE 427

DO NOT WRITE IN THIS AREA Code: ___ ___ ____ SE: ___ SA: ___ MRS: ___ WRS: ___ PROC: ___ ___ P/D: ___ 412 Appendix J (continued) Together, Sara and Brendan have 20 pencils. Sara says of the pencils are hers. Brendan says 15 of the pencils belong to him. Show how they could both be right, then explain how you kn ow your answer is correct. Work space: Before you begin working, circle th e face that best shows how well you think you can solve this problem. When you are done working, circle the face that be st shows how well you think you did on this problem. Pencil Problem

PAGE 428

DO NOT WRITE IN THIS AREA Code: ___ ___ ____ SE: ___ SA: ___ MRS: ___ WRS: ___ PROC: ___ ___ P/D: ___ 413 Appendix J (continued) A package of birdseed costs $2.58 for 2 pounds. A package of sunflower seeds costs $3.72 for 3 pounds. What is the difference in the cost per pound ? Show your work, then explain how you know your answer is correct. Work space: Answer: ________ When you are done working, circle the face that be st shows how well you think you did on this problem. Before you begin working, circle th e face that best shows how well you think you can solve this problem. Birdseed Problem

PAGE 429

APPENDIX K ITEM-SPECIFIC RUBRICS FOR MATHEMATICS SCORING 414

PAGE 430

Task: FLOWER POT MATH 4 points MODEL RESPONSE Explanation must include context 20 pots x 3 seeds per pot = 60 seeds (minimum info needed for 4 points) Must include mention/labeling of 20 pots & 3 seeds/pot (or # seeds per row x rows) OR Accurate counting by 3s with mention of pots and seeds. 3 points SUCCESSFUL RESPONSE Correct procedure w/o explanation OR Correct procedure w/ explanation, but computational error OR Correct procedure w/ explanation that only refers to the operations used and the numbers involved. 2 points Correct procedure w/ computational error and no explanation OR Correct computation but incorrect number of pots and no explanation OR Correct procedure with incorrect explanation OR Correct answer w/o supporting work or explanation OR Correct answer w/ flawed explanation and no work 1 point Incorrect procedure, incorrect or missing explanation, but some correct work such as correct number of pots noted. 0 points Incorrect procedure and incorrect or missing explanation. OR Incorrect answer with no work or explanation. 415

PAGE 431

Appendix K (Continued) Task: MRS. HERNANDEZ MATH 4 points MODEL RESPONSE Correct solution of 2 substitutes with reasonable, contextualized, explanation (includes reference to groups/teams and that remainder is # of subs), and work shown. Explanation can have minor flaw if all steps are covered. 3 points SUCCESSFUL RESPONSE Correct solution with work shown but no explanation. OR Correct solution with explanation that lacks context. 2 points 34=4R2, no explanation, answer given as 4 R2. OR Answer given as 4 groups or 8 teams of 4 failure to answer the question re substitutes. OR 34=4 no mention or show of remainder, but correct answer of 2 on blank OR Correct solution w/ seriously flawed work or no work. 1 point Only partial grouping by 8s. OR 34=4 kids were subs OR Incorrect division, but some effort to answer question OR Some correct work 0 points Incorrect computation, no explanation and no correct work 416

PAGE 432

Appendix K (Continued) Task: CALORIES MATH 4 points MODEL RESPONSE Correct answer of cheeseburger and yogurt with work shown using either exact or rounded numbers. Includes contextualized explanation that refers to estimation, rounding, or close and target sum 3 points SUCCESSFUL RESPONSE Added 393+214=607, correct answer of item names, but explanation lacks mention of rounding or getting close. OR 400+200=600, answer correct, but explanation missing OR Added many combinations, answer correct, but explanation incomplete or missing OR Answer reported as 393 and 214 w/o explanation 2 points Correct answer, missing or flawed explanation, and some error(s) in work OR Added pairs of items but said no answer missed concept of about OR Correct answer unsupported by work OR Correct work but failed to answer question OR Incorrect answer but logical explanation 1 point Doubled the calories of items & chose 2 hotdogs OR Added one or more combinations, but no answer given OR Chose 3 items totaling close to 600 calories OR Added 2 items wrong items not close enough no explanation or explanation w/o work 0 points No work OR Gives more than 2 items as answer with incorrect total OR List two incorrect items w/ no work or no correct work 417

PAGE 433

Appendix K (Continued) Task: JEANS CLASS MATH 4 points MODEL RESPONSE Correct answer of either 30 (or 20 boys and 10 girls) with contextualized explanation that clearly shows how answer obtained. Explanation must refer to double or twice or multiplying by 2. 3 points SUCCESSFUL RESPONSE Correct solution with correct work but no explanation or explanation w/o context OR Minor flaw in solution process but explanation clear and correct 2 points Answers # of boys w/ some explanation or supporting work show understanding or relationship between # of boys and # of girls, but ignores question of total OR Correct answer with no supporting work or explanation 1 point Answered 20 (# of boys) with no explanation or supporting work OR Answered incorrect # of boys, but some work or explanation correct 0 points No response or completely incorrect response 418

PAGE 434

Appendix K (Continued) Task: PIZZA MATH 4 points MODEL RESPONSE Show of 2 separate pizzas (one big and one small) says Joses is bigger so his is bigger OR Explains that if Joses pizza is bigger the half he ate would be bigger than Ellas half Response must include in text or illustrated in picture 3 points SUCCESSFUL RESPONSE Explanation w/o picture and no mention of halves simply that Joses pizza could have been bigger OR Picture that shows Joses as bigger, but explanation focuses on number of pieces in logical way OR Shows of 2 pizzas w/ Joses bigger w/o explanation OR Correct response and explanation, except picture does not show Joses pizza bigger OR No conclusion but valid argument not completely supported 2 points Concludes they ate the same with either 2 halves of same pizza or of 2 same-size pizzas OR Erroneous conclusion but with some valid support OR Shows of 2 pizzas w/ Joses larger, but flawed explanation OR Concludes that Ella could have had of a half w/ supporting diagram 1 point Shows 2 halves of same pizza w/ labeling but no explanation OR Shows of 2 pizzas w/ missing or flawed explanation. Joses pizza not clearly bigger OR Ella ate of a w/o supporting diagram OR Says Jose right if pizza cut wrong, but picture doesnt support argument OR Jose ate more than half OR They ate the same w/o any support 0 points No valid work or explanation 419

PAGE 435

Appendix K (Continued) Task: STRING MATH 4 points MODEL RESPONSE Divided 3/4 by 1/8 w/ correct answer of 6 and explanation w/ context OR Showed 3/4 as = to 6/8 w/ correct answer of 6 and explanation w/ context 3 points SUCCESSFUL REESPONSE Correct answer with work shown but no explanation (may have minor flaw in work) OR Correct answer with explanation but no work shown (may have minor flaw in explanation) 2 points Correct answer w/ no work or explanation OR Correct process with computation error and insufficient/flawed explanation OR Correct answer with flawed work and flawed explanation OR Converts 3/4 to 6/8 but reports answer as 6/8 or 2/8 (doesnt answer question) OR Finds = 27 inches and divides that by 8 1 point Response shows student understands problem but unable to do computation OR Correct answer, but work and/or explanation doesnt support answer (lucky guess) OR Picture or work showing 6 out of 8 in some fashion w/ 3 out of 4, but nothing else makes sense OR Guess that is close (5-7) w/o any work shows some understanding of the problem OR Student knows division or repeated subtraction is necessary, but doesnt know what to divide or how to do it OR Shows cutting a length (string) into pieces 1/8 long 0 points Work doesnt indicate student understanding of problem or the work needed to solve 420

PAGE 436

Appendix K (Continued) Task: BALLOONS MATH 4 points MODEL RESPONSE Estimate is in range (150 280) based on 5-8 sections of 30-35 balloons. Work is shown and explanation is complete and in context (refers to sections and # of balloons) 3 points SUCCESSFUL RESPONSE As above with work shown (# of sections and number in each section are included in computation, but no labeling is required) but w/o explanation. Minor computational error okay. 2 points Estimate in range without indication of valid strategy for obtaining estimate OR Marked off sections, valid strategy, but major computational error OR Valid strategy but erroneous conclusion OR Estimate out of range because counted either 4 or 9 sections 1 point Estimate out of range <150 or >280 without valid strategy or explanation OR Shows evidence of attempt to estimate with a result > 32 0 points Result less than or equal to 32. No valid strategy or valid explanation. 421

PAGE 437

Appendix K (Continued) Task: FLOUR MATH 4 points MODEL RESPONSE Correct solution of 4 cups with work shown (either multiplication, addition, or pictorial) with explanation that gives reference to context of batches and cups 3 points SUCCESSFUL RESPONSE Correct solution with some work but no explanation OR Correct solution with no work but with explanation OR Answered 3 3/3 with correct work and explanation 2 points Shows understanding of strategy, but error in fraction work (e.g. 3 3/9 for answer) OR Correct answer without valid work or explanation 1 point Shows some understanding of question, but erroneous strategy for solving (e.g. 1 1/3 + 1 1/3 + 1 1/3 = 1 3/9) with error in both whole number and fraction 0 points Work shows little or no understanding of problem or process needed to solve 422

PAGE 438

Appendix K (Continued) Task: CALCULATOR MATH 4 points MODEL RESPONSE 2 correct responses clearly explained in drawings, words, or numbers that refer to both what to do (e.g. add 100) and the result of 8375. OR 1 response like above, and one complete response w/o mention of result 3 points SUCCESSFUL RESPONSE 2 correct responses that do not explain and/or do not give result OR 1 correct, explained response, one with minor flaws or only partially valid 2 points 1 correct response, 2 nd response missing or invalid OR 2 partially valid responses (a response that says to add 100 to the other number or add 100 to the answer at the end is partially valid) 1 point Both responses show some understanding of question, but lack place value understanding (e.g., add 10, or add 20 then subtract 10, or add 1000, or add 500 + 500) OR Shows some understanding, but methods dont get you the correct readout OR 1 partially valid response 0 points No valid response or a response that does not indicate understanding of question OR Response such as times it by a number OR Silly response such as do it on paper 423

PAGE 439

Appendix K (Continued) Task: SCHOOL LUNCH MATH 4 points MODEL RESPONSE Correct answer of 9 with valid work shown and explanation or labeling of all work with mention of rounding and some context 3 points SUCCESSFUL RESPONSE Correct answer of 9 and valid work with no explanation OR Correct answer with labeling, but missing steps OR Minor flaw in computation but work clear and answer rounded up to nearest dollar OR All correct, but rounded single day to $2, then multiplied by 5 days 2 points All computation correct, but failed to round up to nearest dollar and did not explain OR Correct answer for one day, rounded up to the nearest dollar OR Little work, mostly correct, rounded up to nearest dollar 1 point Shows some understanding of question but work seriously flawed OR Found total for one day but did not round up 0 points Little or no correct work. Doesnt show understanding of problem situation. 424

PAGE 440

Appendix K (Continued) Task: PENCILS MATH 4 points MODEL RESPONSE Shows and explains Sarahs is 5 pencils, added to Brendans 15 totals 20 pencils 3 points SUCCESSFUL RESPONSE Shows Sarah has 5 and Brendan 15, but does not show or explain that 5 pencils = 1/4 of 20 pencils. Does show 5+15 = 20 OR Accurate computation without explanation or context. Shows of 20 = 5 and 5+15=20 OR Accurate computation of 20=5, of 20=15, but no context OR Sarahs of 20=5 & Brendans is 15 doesnt total 2 points Sarah has 5, Brendan has 15. Does not relate 5 to or total of 20 OR Does relate Sarahs to 5 but stops there 1 point Some understanding shown 0 points No correct work OR Only restates info in problem 425

PAGE 441

Appendix K (Continued) 426 Task: BIRDSEED MATH 4 points MODEL RESPONSE Shows $2.58 2 and $3.92 3, finds difference of 5 cents, and explains process in context of price per pound 3 points SUCCESSFUL RESPONSE Correctly shows division but answers 5 without context OR All work correct and shown but includes a minor computational error OR Shows understanding of finding price per pound and gives explanation but has more than one computational error OR All work correct without explanation 2 points Subtracts 3.72-2.58 correctly and provides some contextualized explanation OR Finds price per pound on one or two items, but stops there OR Shows some attempt at finding the price per pound 1 point Student subtracts total prices and explains (or label s) some, but has computational error(s) OR Student subtracts 3.72 2.58 correctly, but no correct explanation or labeling 0 points No correct work

PAGE 442

APPENDIX L ITEM-SPECIFIC RUBRICS FOR LINGUISTIC EXPLANATION SCORING 427

PAGE 443

Task: FLOWER POT WRITING 3 points Explains computation (either multiplication or repeated addition, or counting by threes), referring to 20 pots and 3 seeds per pot 2 points Explains computation and refers to either number of pots or # of seeds, but not both OR Labels seeds and pots but doesnt completely explain computation 1 point Refers to operation but not to context OR Labels # of seeds and # of pots OR I know my answer is correct because 20x3=60 OR Good explanation that does not support the work 0 points No text OR Restates answer but does not explain process (e.g., Kevin needs 60 seeds) as only text OR Completely incorrect or irrelevant explanation 428

PAGE 444

Appendix L (Continued) Task: MRS. HERNANDEZ WRITING 3 points Explain division with mention of groups/teams and relates remainder to the number of substitutes OR Explain multiplication and subtraction May have minor flaw 2 points Explain division and remainder w/ moderate flaws OR Explains computation process OR Tells number of teams and # left over OR States that the remainder was the number of substitutes 1 point Shows each team being added to total 32. No words, but team numbers labeled. OR Tells that divided 8 into 34 stops there OR States operation used 0 points No text OR Restates answer as only text; e.g., There will be 2 substitutes OR Completely incorrect explanation 429

PAGE 445

Appendix L (Continued) Task: CALORIES WRITING 3 points Explanation of using addition and finding a sum or total that is about or close to 600 calories Must include context 2 points Mention of adding and about or close, but no reference to target sum OR Good explanation of process w/o context 1 point Wrote out some possible item combinations OR States that must add OR States to guess and check OR Tells what two items were added for answer OR States rounding 0 points No text OR Completely incorrect explanation that in no way supports the work OR Simply restates answer 430

PAGE 446

Appendix L (Continued) Task: JEANS CLASS WRITING 3 points Text that addresses twice or double # of girls to get # of boys and adding them together must include boys and girls in explanation 2 points Addresses twice as many boys, omits addressing adding the girls or including the girls OR Complete except omits explanation of how got # of boys OR Complete explanation of computation process (no context) 1 point States boys 10 girls = 30 OR States the operation used OR Labeled boys and girls with words or abbreviations 0 points No text OR Simply restates answer OR Text that in no way supports the work 431

PAGE 447

Appendix L (Continued) Task: PIZZA WRITING 3 points Mentions Joses pizza bigger and that his half would be larger that Ellas half OR Joses pizza bigger, labels halves and both pizzas Must include in words or symbols and must draw conclusion 2 points Mentions Joses pizza bigger or Ellas pizza smaller OR Says that Ella is right, they ate the same amount OR Explains that Ella could have eaten of OR Good explanation but no conclusion OR Jose right if e.g. pizza cut wrong 1 point Labeled pizzas as Joses and Ellas OR Explains, but text is mostly contradictory with itself or picture OR Conclusion w/o explanation OR Poor explanation w/ no conclusion 0 points No text OR Text that in no way supports the work or picture 432

PAGE 448

Appendix L (Continued) Task: STRING WRITING 3 points Explained division and the number of pieces of string OR Explained 3/4 = 6/8 equivalency and the number of pieces of string 2 points Explains equivalency of 3/4 = 6/8, but omits context 1 point Attempted explanation, but seriously flawed OR Labels only 0 points No text OR No correct, relevant text OR Restates answer in words 433

PAGE 449

Appendix L (Continued) Task: BALLOONS WRITING 3 points Explains marking off a number of sections of balloons and the number of balloons in each section OR Explains # of groups and repeated addition 2 points Explanation includes either marking off sections of balloons or the number of balloons in each section, but not both OR Explanation includes the whole process with no context 1 point Labeled sections and balloons in each OR e.g. multiply 32 by 7 OR About 0 points No text or labeling OR No text that supports the work OR Restates answer in words 434

PAGE 450

Appendix L (Continued) Task: FLOUR WRITING 3 points Explains multiplication (or addition) with reference to the number of cups per batch and the number of batches 2 points Explains the whole computation process w/o context 1 point Explains only part of the computation process (e.g. 3 x 1/3 = 1) OR States the operation used OR Text attempts to support work that makes no sense OR Labels only 0 points No text OR No text that supports the work OR Labels or restates answer of needing 4 cups 435

PAGE 451

Appendix L (Continued) Task: CALCULATOR WRITING 3 points Both responses explain what to do and show what the result will be OR One response shows what to do and result while second response explains what to do 2 points Both responses explain what to do, but omit reference to the result (obtaining 8375) OR One response explains what to do and the result, but the other is not complete or doesnt make sense 1 point Tells to add OR Tells briefly what to do for one response OR Tells one response, but doesnt make complete sense OR Two responses that are vaguely worded 0 points No text OR No text that supports the work 436

PAGE 452

Appendix L (Continued) Task: SCHOOL LUNCH WRITING 3 points Explains process of solving the problem and includes the context of the items purchased, the number of days, and the number of bills needed OR Explains how got total for week, rounding, and reference to dollar bills 2 points Explains most of the process of solving the problem, but omits most or all of the context 1 point Explains rounding the final answer OR Labels food items OR Tells what operations were used 0 points No text OR No text that supports the work OR Simply restates answer of needing 9 dollars 437

PAGE 453

Appendix L (Continued) Task: PENCILS WRITING 3 points Problem solving process is explained that relates to 5 pencils and either mentions that Sarahs and Brendans pencils total 20 or that 15 would be left from Brendan 2 points Explains context or computation, but not both OR Says e.g. that Sarah has 5 pencils and Brendan has 15 pencils OR Good explanation, but error in understanding fractions 1 point Labels only OR Restates part of the problem OR Explains only one point OR Some correct explanation, but contradictory statements included 0 points No text OR No correct relevant text 438

PAGE 454

Appendix L (Continued) 439 Task: BIRDSEED WRITING 3 points Explains process of dividing to obtain cost per pound and subtracting to find the difference 2 points Explains either the process of dividing to obtain cost per pound OR subtracting to find the difference with some context included OR Explains all of the computation, but omits the context OR Labels cost per pound 1 point Labels only OR Minimal explanation that doesnt contribute much OR Restates some of the information given in the problem 0 points No text OR No correct relevant text OR Simply states answer

PAGE 455

440 APPENDIX M SOLUTION PROCESSES IDENTIFIED FOR EACH QUESTION

PAGE 456

FLOWER POTS Process Category Theme Freq. Percent Description 0 U 17 0 0.00 No meaningful process shown 1 S 1 164 78.48 Multiplied 20 x 3 2 S 1 22 9.28 Multiplied 5 x 4 then 20 x 3 3 U 13 6 2.53 Divided 20 by 3 4 S 4 17 7.17 Counted by 3s minor computational error okay 5 S 2 3 1.27 15 seeds per column and 4 columns either repeated addition or multiplication 6 S 2 6 2.53 12 seeds per row and 5 rows either repeated addition or multiplication 7 P 7 3 1.27 5 x 4 = 20, stopped there 8 P 9 4 1.69 Multiplied incorrect numbers (miscounted pots) 441 9 S 2 1 0.42 Grouped pots by 3, nine seeds per group, 6 groups + 6 seeds 10 S 2 1 0.42 Combined 20 3s to get 10 6s, then 5 12s, then 2 24s + 12 12 S 2 1 0.42 Grouped pots by 10 = 30 seeds, 2 groups 14 U 14 1 0.42 Added 20 + 3 15 S 5 3 1.27 Correct answer of 60 but no work 16 P 6 2 0.84 Like #6, but incorrect number of seeds per row or incorrect # rows 17 U 17 3 1.27 Multiplied incorrect numbers major error Note. Category: U = unsuccessful, P = partially successful, S = successful. Themes are defined in Table 13. Missing process numbers indicate processes for which the frequency dropped to zero when students with missing data were removed from the study.

PAGE 457

Appendix M (Continued) MS. HERNANDEZ Process Category Theme Freq. Percent Description 0 U 17 2 0.84 No meaningful process shown 1 S 1 128 54.01 34 divided by 8 with final answer of 2 2 S 4 10 4.22 Repeated addition to get close to 34, or counted by 8s with final answer of two 3 S 3 10 4.22 Solved with a picture using groups of 8 with final answer of 2 4 S 3 3 1.27 Solved with a picture using 8 groups and final answer of 2 5 U 8 3 1.27 Multiplied 34 x 8 6 S 4 32 13.50 Multiplied 8s to get close to 34 and final answer of 2 7 P 7 7 2.95 Subtracted, 34 8 = 26 8 S 4 4 1.69 Repeated subtraction of 8s and final answer of 2 442 9 P 4 2 0.84 Calculated using wrong numbers but good logic 10 U 15 2 0.84 Unsuccessful attempt at solving through a picture 11 U 17 1 0.42 Divided 34 by 4 12 S 5 2 0.84 Correct answer, no work 13 P 7 21 8.86 Like #1, but either did not answer/misunderstood question 14 P 7 1 0.42 Like #3, but either did not answer/misunderstood question 15 P 7 2 0.84 Like #6, but either did not answer/misunderstood question 16 P 7 1 0.42 Like #8, but either did not answer/misunderstood question 17 P 6 6 2.53 Like #6, but mathematical work seriously flawed

PAGE 458

Appendix M (Continued) CALORIES Process Category Theme Freq. Percent Description 0 U 17 1 0.42 No meaningful process shown 1 S 1 128 54.01 Added multiple pairs of exact numbers correctly correct answer minor flaws okay 2 S 1 3 1.27 Added multiple pairs of rounded numbers correctly correct answer minor flaws okay 3 S 1 30 12.66 Added only one pair the correct one with exact numbers 4 S 1 22 9.28 Added only one pair the correct one with rounded numbers 5 U 13 4 1.69 Added only one pair the wrong one with exact numbers 7 U 13 3 1.27 Tried to find multiples of one item or combinations of 3-4 items to total 600 443 8 S 1 2 0.84 Added pairs of exact numbers and rounded numbers with correct answer 9 S 5 2 0.84 Stated a strategy such as estimate or round, no work, correct answer 11 U 17 1 0.42 Explanation implies strategy, no work, answer incorrect 12 U 17 1 0.42 Like #9, but incorrect answer 13 U 17 17 7.17 Like #1, but incorrect answer and major flaws in computation and logic 16 P 6 2 0.84 Like #6, but with good logic, computational error 17 P 7 11 4.64 Like #1, good logic but incorrect answer, error in understanding question 18 P 6 10 4.22 Like #1, good logic but incorrect answer, computational error

PAGE 459

Appendix M (Continued) JEANS CLASS Process Category Theme Freq. Percent Description 0 U 17 5 2.11 No meaningful process shown 1 S 1 81 34.18 10 x 2 = 20 + 10 = 30 2 P 7 71 29.96 10 x 2 = 20 (boys or kids) may have minor computational error 3 S 1 15 6.33 10 + 10 + 10 = 30 (or 10 + 10 = 20 + 10 = 30) 4 P 7 3 1.27 20 boys with no work shown 5 S 1 25 10.55 20 boys + 10 girls = 30 kids 6 P 8 5 2.11 30 boys or 12 boys misunderstands twice (thinks it means x 3 or +2) 7 P 7 14 5.91 10 + 10 = 20 boys 444 8 S 2 4 1.69 10 x 3 = 30 kids 9 U 13 1 0.42 Doubled girls and boys for 20 + 20 = 40 10 P 7 3 1.27 Misunderstood problem either read twice as many girls as boys or that there are 10 boys and girls in the class 11 S 1 8 3.38 10 x 2 = 20 or 10 + 10 = 20 with answer 20 boys, 10 girls 12 S 5 1 0.42 20 boys, 10 girls, no work 13 S 5 1 0.42 30, no work

PAGE 460

Appendix M (Continued) PIZZA Process Category Theme Freq. Percent Description 0 U 17 10 4.22 No meaningful process shown 1 S 1 99 41.77 Joses pizza bigger, or Ellas pizza smaller 2 P 8 39 16.46 Both ate the same amount explanation logical 3 S 2 4 1.69 Jose right because he had more toppings on his pizza 4 S 2 9 3.80 Jose right because he ate more pieces (bigger pizzas have more pieces) 5 P 8 8 3.38 Jose ate more because the slices were bigger (not clear that the pizza was bigger) 6 U 14 4 1.69 Cutting different configuration changes area (e.g. horiz vs. vert or cutting one half into more pieces than another identical half) 445 7 U 16 21 8.86 Situation not as it appears (somebody cheated, ate another bite, didnt finish) 8 P 10 7 2.95 Ella ate of 9 P 10 11 4.64 Pizza cut wrong (not really in half) 10 P 11 7 2.95 and no stated conclusion 11 P 7 3 1.27 Pizzas different sizes, no conclusion 12 P 7 3 1.27 Concluded Jose right because they ate difference pizzas (no mention of size) 14 U 14 1 0.42 Pizzas different shapes (e.g. oval vs. round) 16 U 13 1 0.42 Both are right 17 U 14 2 0.84 Like #2 but explanation not logical 18 U 17 8 3.38 Explanation so unclear or contradictory, impossible to determine process

PAGE 461

Appendix M (Continued) STRING Process Category Theme Freq. Percent Description 0 U 17 52 21.94 No meaningful process shown 1 U 15 18 7.59 Drew 3/4 of something, no other valid work 2 U 14 6 2.53 Added 3/4 + 1/8 3 P 11 10 4.22 Subtracted 3/4 1/8 and stopped 4 U 15 19 8.02 Drew something and divided it into 8 parts or eighths incorrect answer 5 S 1 13 5.49 Found 3/4 = 6/8 via computation answer correct 6 S 1 2 0.84 Did 3/4 divided by 1/8 and answer correct 8 P 9 26 10.97 Drew 3/4 of something and 1/8 of something incorrect answer 9 S 2 4 1.69 Converted problem to inches and feet and answer correct 446 10 U 17 1 0.42 Added + 11 U 14 16 6.75 Did computation using the numerals in the fractions w/o fractional meaning 12 U 17 1 0.42 Added various fractions (not 3/4 and 1/8) 13 U 17 1 0.42 Subtracted various fractions 16 U 14 1 0.42 1 foot divided by 1/8 yard 18 U 17 2 0.84 3/8 x 1/8 19 S 5 1 0.42 Correct answer, no work 20 S 3 24 10.13 Like #4 or #5, but solved via drawing/diagram 21 U 12 3 1.27 Like #14 but answer incorrect major flaw 22 U 12 10 4.22 Like #7 but answer incorrect major flaw 23 U 12 5 2.11 Incorrect due to conceptual/computational problems with fractions 24 P 7 1 0.42 Misunderstood question, but some valid work

PAGE 462

Appendix M (Continued) STRING (Continued) Process Category Theme Freq. Percent Description 25 P 6 11 4.64 Like #6 but answer incorrect moderate flaw 26 U 12 3 1.27 Like #15 but answer incorrect major flaw 27 U 12 1 0.42 Like #9 but answer incorrect major flaw 28 U 15 1 0.42 Like #17, but attempted to solve through drawing/diagram 29 P 7 4 1.69 Like #5, but did not answer question (answered 6/8) 30 P 7 1 0.42 Like #20, but did not answer question (answered 6/8) 447

PAGE 463

Appendix M (Continued) BALLOONS Process Category Theme Freq. Percent Description 1 S 1 105 44.30 Multiplied 32 balloons by 5-8 sections minor computational error okay 2 S 1 34 14.35 Rounded to 30 balloons and multiplied by 5-8 sections minor computational error okay 3 S 4 18 7.59 Repeated addition of 32, 5-8 times 4 U 13 9 3.80 Answer of 32 or less without work 5 S 1 28 11.81 Multiplied or added using 32 balloon per section, rounded final answer within range (150-280) 6 S 1 6 2.53 Added some sections mentally and did the rest on paper answer within range (150-280) 448 7 S 2 5 2.11 Grouped sections to aid in addition answer within range (150-280) 8 P 8 5 2.11 Added or multiplied 2-4 sections of 32 balloons 9 S 5 7 2.95 Reasonable estimate with no work shown 10 P 8 1 0.42 Rounded to 30 and multiplied or added for 8-10 sections 11 P 8 7 2.95 Added or multiplied by 32 with 8-10 sections 12 U 13 1 0.42 Manually counted the balloons 13 P 8 2 0.84 Doubled 32 for each of 5-8 sections to get very large answer 14 U 17 1 0.42 Multiplied numbers not in the problem 15 U 17 5 2.11 Unreasonable estimate not enough work shown to follow process 16 U 14 1 0.42 Divided 32 by number of sections 17 S 1 1 0.42 Same as #2 but rounded final answer answer in range 18 S 4 1 0.42 Combination of repeated addition and estimation final answer in range

PAGE 464

Appendix M (Continued) FLOUR Process Category Theme Freq. Percent Description 0 U 17 8 3.38 No meaningful process shown 1 S 1 23 9.70 1 1/3 x 3 correct answer 2 S 1 53 22.36 1 1/3 + 1 1/3 + 1 1/3 correct answer 3 S 3 6 2.53 Solved strictly with a picture/diagram correct answer 4 P 6 20 8.44 1 1/3 x 3 but computational problems (e.g. answered 3 3/9) 5 P 6 4 1.69 Converted fractions to decimals (incorrectly) then did computation 6 S 2 5 2.11 Converted fractions to decimals (correctly) then did computation 7 P 6 73 30.80 1 1/3 + 1 1/3 + 1 1/3 with computational problems 449 8 P 6 8 3.38 Only computed the fractions (either answered 1 cup with fraction or ignored 1 cup) 9 U 12 5 2.11 Added 3 + 1 1/3 10 U 17 5 2.11 Computed with numbers not related to the problem 11 U 17 1 0.42 Divided numbers in problem 12 P 6 12 5.06 Only computed the whole numbers (e.g. answered 3 1/3) 13 S 5 3 1.27 Correct answer without work 14 P 6 4 1.69 No work shown, computational problems (e.g. answered 3 3/9) 15 S 5 1 0.42 Correct answer with logic explained but no work 16 P 9 4 1.69 Like #3 but only partially correct 17 U 14 2 0.84 Like #3 but incorrect

PAGE 465

Appendix M (Continued) CALCULATOR (Frequencies and Percents represent combined totals for methods one and two) Process Category Theme Freq. Percent Description 0 U 17 92 19.41 No meaningful process shown 1 S 1 138 29.11 Add 100 2 S 1 32 6.75 Add a number and subtract a second number that is 100 less than the number added 3 S 1 47 9.92 Add a combination of numbers that totals 100 4 S 2 12 2.53 Erase back to the 3 and replace with a 2 5 U 12 6 1.27 Add a number that does not give correct result 6 P 8 31 6.54 Partially valid response one that works only with some operations (e.g., add 100 to the answer, add 100 to the other number) 7 U 16 64 13.50 Nonsense response (e.g., do it on paper, get another calculator, ask your parents) 450 8 S 1 8 1.69 Add a x b which equals 100 9 U 12 12 2.53 Subtract a number that does not give correct result 11 S 2 6 1.27 Compute to get zero and add correct number (e.g., multiply by 0, subtract 8375) 12 P 8 12 2.53 Response shows good logic but place value misunderstanding (e.g., add 1000, or add 1) 13 U 12 1 0.21 Combination of + and that does not give correct result 14 U 14 3 0.63 Operational misunderstanding (e.g., multiply by 100) 15 S 2 3 0.63 Hit = and start over 16 U 14 2 0.42 Enter correct number without clearing the incorrect number 17 U 17 4 0.84 Explanation so unclear impossible to determine 18 P 11 1 0.21 Good logic but does not address problem When student repeats a response, the second one is categorized as .

PAGE 466

Appendix M (Continued) SCHOOL LUNCH Process Category Theme Freq. Percent Description 0 U 17 10 4.22 No meaningful process shown 1 S 1 86 36.29 $9 answer found by adding 3 items x 5 then rounding (minor computational error okay) 2 P 7 24 10.13 $8.75 answer found by adding 3 items x 5 (minor comp error okay) 3 P 7 17 7.17 $1.75 answer total for one day without rounding (minor computational error okay) 4 P 7 21 8.86 $2.00 answer rounded total for one day 5 S 1 23 9.70 $10.00 answer found by rounding total for one day x 5 (minor computational error okay) 6 P 7 1 0.42 Found cost of 1 item for 5 days 451 7 U 13 3 1.27 Added 3 items and $1.00 8 P 6 4 1.69 Like #1 with major computational error 9 P 6 2 0.84 Like #2 with major computational error 10 S 1 11 4.64 Multiplied each item by 5, added, and rounded (minor comp error ok) 11 P 7 3 1.27 Added each item five times without final rounding 12 P 7 3 1.27 Added 3 items then added $1.75 five times without final rounding 13 P 7 2 0.84 Multiplied each item by 5, added, but did not round 14 U 13 3 1.27 Interpreted question as what can Sam buy for $1.00 15 P 7 6 2.53 Worked problem with only 2 items or for less than five days 16 P 6 1 0.42 Rounded total for one day x 5 but with major comp error 17 S 4 9 3.80 Like #12, but rounded final answer 18 S 4 3 1.27 Like #11, but rounded final answer 21 P 6 2 0.84 Good logic, incorrect work 22 P 7 3 1.27 Some correct work, but misunderstood question

PAGE 467

Appendix M (Continued) PENCILS Process Category Theme Freq. Percent Description 0 U 17 39 16.46 No meaningful process shown 1 S 1 76 32.70 of 20= 5 (stated or implied) and one of the following: 5+15=20, 20-15=5, 15 left 2 S 3 13 5.49 Solved with picture only showing 20 pencils, Brendan with 15 and Sara with 5 3 P 7 25 10.55 Brendan 15 and Sara 5, or Sara 5 and 20 together 4 S 2 17 7.17 Brendan has (15) and Sara has (5) 5 P 7 11 4.64 20-15=5 or 15+5 =20 6 P 7 2 0.84 Finds of 20 = 5 and stops 7 P 6 10 4.22 Incorrect due to minor fraction computational/conceptual problems (e.g. of 20 = 4 with balance of work correct) 452 8 U 13 19 8.02 Incorrect due to misunderstanding of problem 9 S 1 9 3.80 20=5 or 5x4=20, then 5+15=20 10 U 15 6 2.53 Solved with picture only, but picture incorrect or not productive 11 U 12 10 4.22 Like #7, but major problem

PAGE 468

Appendix M (Continued) BIRDSEED Process Category Theme Freq. Percent Description 0 U 17 21 8.86 No meaningful process shown 1 S 1 62 26.16 Found price per lb. and difference (minor computational error okay) 2 P 11 102 43.04 Subtracted 3.72 2.58 3 P 7 4 1.69 Found price per pound for one item only 4 U 13 5 2.11 Subtracted 3 pounds 2 pounds = 1 pound 5 U 13 4 1.69 Added 2.58 + 3.72 6 P 6 5 2.11 Erroneous method for changing 3 pounds to 1 or 2 pounds (e.g., subtracting a dollar from the price for each pound less) 7 P 10 3 1.27 Found price per lb for birdseed, multiplied by 3, and found difference btwn price for 3 lbs of birdseed and 3 lbs of sunflower seeds) 8 P 11 2 0.84 Rounded prices then subtracted 9 U 13 3 1.27 Multiplied pounds by price (e.g., 2.58x2 and/or 3.72x3) 10 P 8 1 0.42 Finds price per pound for two items and adds them together 11 U 17 6 2.53 Did computation with numbers not in problem 12 U 14 4 1.69 Subtracted 2.58-3.72 13 S 4 2 0.84 Found price per pound by trial and error addition then subtracted minor computational error okay 14 S 5 1 0.42 Found price per pound without written work then subtracted 15 U 17 3 1.27 Added 2.58+1.14 or subtracted 2.58-1.14 16 P 6 1 0.42 Attempts to compute price per pound but cannot do computation 17 P 7 4 1.69 Found price per pound, but remainder of work not meaningful 18 P 8 1 0.42 Computed 3.72-2.58 19 P 8 1 0.42 Price per pound minus 1.14 20 P 12 2 0.84 Like #13 but with major error 453

PAGE 469

APPENDIX N DESCRIPTIVE STATISTICS FOR MATHEMATICS PERFORMANCE BY QUESTION BY GROUP 454

PAGE 470

Descriptive Statistics for Mathematics Performance by Question by Gender Boys Girls Question M SD M SD Flowerpots 2.98 0.89 3.19 0.73 Ms. Hernandez 2.72 0.87 2.63 0.91 Calories 2.94 0.77 2.70 0.98 Jeans Class 2.74 0.87 2.61 1.09 Pizza 2.28 1.30 2.34 1.31 String 0.91 1.09 0.88 1.04 Balloons 2.84 0.88 2.99 0.88 Flour 2.18 1.01 1.87 1.01 Calculator 1.81 1.38 1.77 1.46 School Lunch 2.20 1.02 2.23 1.06 Pencils 2.09 1.38 1.61 1.56 Birdseed 1.61 1.21 1.29 1.31 Note. N=237, max. mean score = 4 points. 455

PAGE 471

Appendix N (C ontinued) Descriptive Statistics for Mathematics Performance by Question by Mathematics Ability High Middle Low Question M SD M SD M SD Flowerpots 3.35 0.69 3.03 0.78 2.88 0.99 Ms. Hernandez 3.04 0.53 2.72 0.82 2.18 1.13 Calories 3.19 0.64 2.77 0.93 2.59 0.85 Jeans Class 3.16 0.77 2.67 0.91 2.18 1.09 Pizza 3.02 1.16 2.22 1.26 1.75 1.23 String 1.77 1.32 0.67 0.83 0.51 0.67 Balloons 3.32 0.66 2.86 0.84 2.57 1.04 Flour 2.79 0.80 1.99 0.91 1.31 0.97 Calculator 2.32 1.43 1.87 1.40 1.00 1.08 School Lunch 2.84 0.59 2.24 0.93 1.45 1.19 Pencils 2.88 1.25 1.72 1.47 1.12 1.09 Birdseed 2.53 1.20 1.35 1.14 0.57 0.70 Note. N=237, max. mean score = 4 points. 456

PAGE 472

Appendix N (C ontinued) Descriptive Statistics for Mathematics Performance by Question by Reading Ability High Middle Low Question M SD M SD M SD Flowerpots 3.19 0.70 3.02 0.90 3.03 0.76 Ms. Hernandez 2.99 0.54 2.62 0.93 2.20 1.10 Calories 3.05 0.76 2.78 0.91 2.50 0.86 Jeans Class 3.05 0.81 2.58 0.96 2.23 1.14 Pizza 2.90 1.22 2.16 1.22 1.50 1.28 String 1.51 1.28 0.63 0.81 0.60 0.89 Balloons 3.19 0.74 2.85 0.89 2.47 0.97 Flour 2.51 0.88 1.96 0.97 1.23 1.01 Calculator 2.25 1.49 1.73 1.35 0.93 1.05 School Lunch 2.78 0.65 2.10 0.98 1.33 1.23 Pencils 2.68 1.35 1.63 1.40 0.97 1.22 Birdseed 2.33 1.31 1.16 1.06 0.73 0.83 Note. N=237, max. mean score = 4 points. 457

PAGE 473

458 APPENDIX O DESCRIPTIVE STATISTICS FOR LINGUISTIC EXPLANATIONS BY QUESTION BY GROUP

PAGE 474

Descriptive Statistics for Linguistic Explanations by Question by Gender Boys Girls Question M SD M SD Flowerpots 1.30 1.06 1.44 1.18 Ms. Hernandez 1.25 0.94 1.36 0.98 Calories 1.30 1.00 1.28 0.98 Jeans Class 1.27 0.97 1.38 1.04 Pizza 1.83 0.84 2.01 0.73 String 0.68 0.72 0.66 0.78 Balloons 1.15 1.05 1.28 1.19 Flour 0.88 0.85 0.95 0.92 Calculator 1.29 0.93 1.31 0.98 School Lunch 0.76 0.87 0.93 0.91 Pencils 1.40 1.00 1.20 1.05 Birdseed 0.71 0.75 0.85 0.76 Note. N=237, max. mean score = 4 points. 459

PAGE 475

Appendix O (Continued) Descriptive Statistics for Linguistic Explanations by Question by Mathematics Ability High Middle Low Question M SD M SD M SD Flowerpots 1.77 1.07 1.26 1.08 1.20 1.15 Ms. Hernandez 1.65 0.92 1.26 0.94 1.00 0.96 Calories 1.54 1.05 1.26 1.01 1.10 0.81 Jeans Class 1.61 1.03 1.28 0.99 1.10 0.92 Pizza 2.14 0.72 1.89 0.77 1.71 0.88 String 1.05 0.85 0.57 0.69 0.49 0.58 Balloons 1.75 1.02 1.09 1.11 0.88 1.03 Flour 1.42 0.91 0.78 0.81 0.67 0.82 Calculator 1.51 1.07 1.40 0.91 0.82 0.77 School Lunch 1.26 0.88 0.67 0.87 0.76 0.84 Pencils 1.88 1.00 1.23 1.03 0.86 0.72 Birdseed 1.16 0.84 0.71 0.73 0.53 0.54 Note. N=237, max. mean score = 4 points. 460

PAGE 476

Appendix O (Continued) Descriptive Statistics for Linguistic Explanations by Question by Reading Ability High Middle Low Question M SD M SD M SD Flowerpots 1.59 1.04 1.32 1.15 1.03 1.07 Ms. Hernandez 1.45 0.96 1.37 0.94 1.07 1.01 Calories 1.38 1.01 1.31 1.00 0.97 0.85 Jeans Class 1.55 0.99 1.25 1.01 1.10 0.92 Pizza 2.15 0.76 1.89 0.73 1.43 0.94 String 0.97 0.83 0.56 0.68 0.43 0.57 Balloons 1.63 1.11 1.06 1.10 0.83 0.91 Flour 1.23 0.89 0.80 0.86 0.63 0.76 Calculator 1.51 1.06 1.34 0.88 0.67 0.71 School Lunch 1.18 0.93 0.69 0.84 0.67 0.84 Pencils 1.77 1.03 1.18 0.96 0.77 0.86 Birdseed 1.16 0.88 0.64 0.63 0.43 0.57 Note. N=237, max. mean score = 4 points. 461

PAGE 477

462 APPENDIX P DESCRIPTIVE STATISTICS FOR SELFEFFICACY BY QUESTION BY GROUP

PAGE 478

Descriptive Statistics for Self-Efficacy by Question by Gender Boys Girls Question M SD M SD Flowerpots 3.81 0.43 3.68 0.61 Ms. Hernandez 3.46 0.84 2.95 1.11 Calories 3.55 0.70 3.24 0.80 Jeans Class 3.82 0.46 3.44 0.91 Pizza 2.80 1.23 2.48 1.30 String 1.96 1.32 1.46 1.23 Balloons 3.29 0.83 3.06 0.82 Flour 3.26 0.96 2.61 1.21 Calculator 2.88 1.32 2.43 1.44 School Lunch 3.34 0.94 2.93 1.13 Pencils 2.77 1.34 2.28 1.48 Birdseed 3.02 1.23 2.52 1.26 Note. N=237, max. mean score = 4 points. 463

PAGE 479

Appendix P (Continued) Descriptive Statistics for Self-Efficacy by Question by Mathematics Ability High Middle Low Question M SD M SD M SD Flowerpots 3.89 0.31 3.74 0.52 3.61 0.67 Ms. Hernandez 3.81 0.44 3.21 0.98 2.63 1.15 Calories 3.54 0.71 3.35 0.78 3.41 0.75 Jeans Class 3.91 0.34 3.66 0.69 3.31 0.97 Pizza 3.02 1.14 2.64 1.27 2.29 1.32 String 2.11 1.30 1.67 1.30 1.47 1.24 Balloons 3.35 0.77 3.20 0.81 2.96 0.92 Flour 3.28 1.03 2.95 1.11 2.63 1.20 Calculator 2.91 1.41 2.74 1.27 2.24 1.58 School Lunch 3.56 0.80 3.16 1.03 2.69 1.17 Pencils 3.32 1.15 2.39 1.43 2.08 1.35 Birdseed 3.31 0.98 2.61 1.36 2.67 1.16 Note. N=237, max. mean score = 4 points. 464

PAGE 480

Appendix P (Continued) Descriptive Statistics for Self-Efficacy by Question by Reading Ability High Middle Low Question M SD M SD M SD Flowerpots 3.82 0.42 3.70 0.58 3.77 0.50 Ms. Hernandez 3.66 0.69 3.12 1.04 2.67 1.09 Calories 3.53 0.67 3.38 0.76 3.23 0.90 Jeans Class 3.92 0.40 3.54 0.81 3.43 0.77 Pizza 2.99 1.12 2.57 1.29 2.20 1.32 String 2.10 1.36 1.55 1.26 1.67 1.21 Balloons 3.30 0.79 3.17 0.87 2.97 0.72 Flour 3.22 1.12 2.87 1.13 2.73 1.05 Calculator 2.75 1.45 2.75 1.32 2.13 1.46 School Lunch 3.38 0.98 3.06 1.10 3.00 0.91 Pencils 3.19 1.22 2.31 1.42 2.00 1.39 Birdseed 3.26 1.01 2.54 1.32 2.-77 1.28 Note. N=237, max. mean score = 4 points. 465

PAGE 481

466 APPENDIX Q DESCRIPTIVE STATISTICS FOR SELF=EFFICACY MEAN BIAS BY QUESTION BY GROUP

PAGE 482

Descriptive Statistics for Self-Efficacy Mean Bias by Question by Gender Boys Girls Question M SD M SD Flowerpots 0.82 0.90 0.49 0.93 Ms. Hernandez 0.74 0.98 0.32 1.21 Calories 0.61 0.97 0.54 1.13 Jeans Class 1.08 0.97 0.82 1.25 Pizza 0.52 1.48 0.14 1.70 String 1.05 1.45 0.58 1.27 Balloons 0.46 1.15 0.64 1.15 Flour 1.08 1.20 0.74 1.16 Calculator 1.07 1.38 0.66 1.52 School Lunch 1.14 1.25 0.69 1.29 Pencils 0.68 1.29 0.67 1.56 Birdseed 1.41 1.59 1.23 1.48 Note. N=237, max. mean score = 4 points. 467

PAGE 483

Appendix Q (Continued) Descriptive Statistics for Self-Efficacy Mean Bias by Question by Mathematics Ability High Middle Low Question M SD M SD M SD Flowerpots 0.54 0.66 0.71 0.93 0.73 1.15 Ms. Hernandez 0.77 0.68 0.49 1.03 0.45 1.57 Calories 0.35 0.92 0.58 1.04 0.82 1.14 Jeans Class 0.75 0.91 0.98 1.05 1.14 1.40 Pizza 0.00 1.36 0.42 1.65 0.55 1.67 String 0.33 1.20 1.01 1.44 0.96 1.33 Balloons 0.04 1.02 0.34 1.21 0.39 1.18 Flour 0.49 1.04 0.96 1.21 1.31 1.19 Calculator 0.60 1.18 0.87 1.45 1.24 1.70 School Lunch 0.72 0.80 0.91 1.32 1.24 1.69 Pencils 0.44 1.09 0.67 1.44 0.96 1.64 Birdseed 0.79 1.36 1.26 1.63 2.10 1.19 Note. N=237, max. mean score = 4 points. 468

PAGE 484

Appendix Q (Continued) Descriptive Statistics for Self-Efficacy Mean Bias by Question by Reading Ability High Middle Low Question M SD M SD M SD Flowerpots 0.63 0.74 0.68 1.04 0.73 0.83 Ms. Hernandez 0.67 0.87 0.50 1.08 0.47 1.63 Calories 0.48 0.91 0.60 1.07 0.73 1.23 Jeans Class 0.86 0.77 0.96 1.19 1.20 1.40 Pizza 0.08 1.45 0.41 1.62 0.70 1.76 String 0.59 1.42 0.92 1.33 1.07 1.51 Balloons 0.11 1.09 0.32 1.20 0.50 1.17 Flour 0.71 1.11 0.91 1.19 1.50 1.25 Calculator 0.51 1.36 1.01 1.42 1.20 1.69 School Lunch 0.60 1.09 0.96 1.22 1.67 1.67 Pencils 0.51 1.24 0.69 1.45 1.03 1.65 Birdseed 0.93 1.31 1.39 1.64 2.03 1.38 Note. N=237, max. mean score = 4 points. 469

PAGE 485

470 APPENDIX R DESCRIPTIVE STATISTICS FOR SELF-ASSESSMENT BY QUESTION BY GROUP

PAGE 486

Descriptive Statistics for Self-Assessment by Question by Gender Boys Girls Question M SD M SD Flowerpots 3.95 0.25 3.78 0.60 Ms. Hernandez 3.58 0.80 3.42 0.94 Calories 3.66 0.80 3.11 1.27 Jeans Class 3.84 0.61 3.51 0.87 Pizza 2.90 1.26 2.90 1.15 String 1.95 1.57 1.41 1.41 Balloons 3.32 0.87 3.26 0.88 Flour 3.22 1.11 2.60 1.38 Calculator 2.78 1.43 2.55 1.51 School Lunch 3.61 0.74 3.14 1.15 Pencils 2.95 1.46 2.25 1.69 Birdseed 3.14 1.27 2.75 1.36 Note. N=237, max. mean score = 4 points. 471

PAGE 487

Appendix R (C ontinued) Descriptive Statistics for Self-Assessment by Question by Mathematics Ability High Middle Low Question M SD M SD M SD Flowerpots 3.96 0.19 3.90 0.39 3.71 0.70 Ms. Hernandez 3.86 0.40 3.54 0.78 3.02 1.19 Calories 3.72 0.67 3.44 1.08 2.98 1.27 Jeans Class 3.93 0.42 3.68 0.74 3.45 0.83 Pizza 3.18 1.05 2.88 1.22 2.65 1.31 String 2.33 1.52 1.49 1.48 1.53 1.45 Balloons 3.44 0.66 3.33 0.90 3.04 0.98 Flour 3.53 0.76 2.95 1.30 2.25 1.34 Calculator 2.84 1.45 2.69 1.48 2.43 1.45 School Lunch 3.68 0.74 3.51 0.92 2.78 1.10 Pencils 3.47 1.20 2.50 1.68 2.04 1.44 Birdseed 3.46 1.05 2.86 1.37 2.63 1.36 Note. N=237, max. mean score = 4 points. 472

PAGE 488

Appendix R (Continued) 473 Descriptive Statistics for Self-Assessment by Question by Reading Ability High Middle Low Question M SD M SD M SD Flowerpots 3.97 0.16 3.83 0.54 3.83 0.46 Ms. Hernandez 3.77 0.61 3.43 0.95 3.23 0.46 Calories 3.56 0.93 3.46 1.02 2.83 1.42 Jeans Class 3.88 0.53 3.63 0.76 3.50 0.82 Pizza 3.23 1.09 2.82 1.19 2.43 1.41 String 2.08 1.59 1.48 1.44 1.77 1.55 Balloons 3.44 0.71 3.25 0.93 3.13 0.97 Flour 3.32 1.18 2.84 1.27 2.50 1.33 Calculator 3.77 1.49 2.65 1.49 2.53 1.33 School Lunch 3.70 0.64 3.28 1.11 3.20 0.85 Pencils 3.33 1.34 2.43 1.64 1.83 1.42 Birdseed 3.40 1.14 2.75 1.39 2.87 1.25 Note. N=237, max. mean score = 4 points.

PAGE 489

APPENDIX S DESCRIPTIVE STATISTICS FOR SELF-ASSESSMENT MEAN BIAS BY QUESTION BY GROUP 474

PAGE 490

Descriptive Statistics for Self-Assessment Mean Bias by Question by Gender Boys Girls Question M SD M SD Flowerpots 0.97 0.84 0.59 0.85 Ms. Hernandez 0.86 0.96 0.79 0.90 Calories 0.72 0.92 0.41 1.37 Jeans Class 1.10 0.95 0.90 1.26 Pizza 0.62 1.35 0.56 1.38 String 1.03 1.41 0.53 1.29 Balloons 0.48 1.14 0.27 1.12 Flour 1.05 1.24 0.73 1.26 Calculator 0.67 1.39 0.78 1.42 School Lunch 1.41 1.04 0.91 1.06 Pencils 0.87 1.06 0.64 1.51 Birdseed 1.53 1.51 1.46 1.42 Note. N=237, max. mean score = 4 points. 475

PAGE 491

Appendix S (Continued) Descriptive Statistics for Self-Assessment Mean Bias by Question by Mathematics Ability High Middle Low Question M SD M SD M SD Flowerpots 0.61 0.73 0.87 0.80 0.82 1.11 Ms. Hernandez 0.82 0.47 0.82 0.93 0.84 1.27 Calories 0.53 0.73 0.67 1.23 0.39 1.31 Jeans Class 0.77 0.93 1.01 1.10 1.27 1.25 Pizza 0.16 1.33 0.66 1.33 0.90 1.39 String 0.54 1.10 0.82 1.44 1.02 1.48 Balloons 0.12 0.95 0.47 1.14 0.47 1.29 Flour 0.74 0.94 0.96 1.31 0.94 1.45 Calculator 0.53 1.15 0.82 1.51 1.43 1.22 School Lunch 0.84 0.68 1.27 1.06 1.33 1.38 Pencils 0.60 0.84 0.78 1.33 0.92 1.56 Birdseed 0.93 1.10 1.53 1.57 2.06 1.35 Note. N=237, max. mean score = 4 points. 476

PAGE 492

Appendix S (Continued) 477 Descriptive Statistics for Self-Assessment Mean Bias by Question by Reading Ability High Middle Low Question M SD M SD M SD Flowerpots 0.78 0.73 0.81 0.94 0.80 0.85 Ms. Hernandez 0.78 0.69 0.81 0.97 1.03 1.22 Calories 0.51 0.85 0.67 1.22 0.33 1.45 Jeans Class 0.82 0.81 1.05 1.15 1.27 1.46 Pizza 0.33 0.81 0.66 1.37 0.93 1.31 String 0.58 1.31 0.84 1.36 1.17 1.53 Balloons 0.25 1.00 0.40 1.17 0.67 1.27 Flour 0.81 1.05 0.87 1.31 1.27 1.48 Calculator 0.52 1.21 0.92 1.47 1.60 1.25 School Lunch 0.92 0.66 1.17 1.13 1.87 1.33 Pencils 0.64 0.95 0.81 1.37 0.87 1.61 Birdseed 1.07 1.22 1.59 1.56 2.13 1.31 Note. N=237, max. mean score = 4 points.

PAGE 493

APPENDIX T MULTIPLE REGRESSION SUMMARY TABLES INCLUDING INTERACTIONS 478

PAGE 494

Multiple Regression Summary Tables with Interactions Summary of Multiple Regression Analysis for Mathematics Performance Variable b SE t p Intercept -16.9588 0.0 3.9026 -4.35* <.0001 Gender -4.1253 -0.2843 5.7378 -0.72 0.4729 FCAT Mathematics Score 0.0774 0.4629 0.0132 5.86* <.0001 FCAT Reading Score 0.0527 0.3058 0.0145 3.65* 0.0003 Gender*Mathematics 0.0074 0.1662 0.0214 0.34 0.7305 Gender*Reading 0.0048 0.1160 0.0212 0.23 0.8211 Note. N=237. b=regression coefficient, =standardized regression coefficient, SE=standard error, * indicates significance at =.05 level. 479

PAGE 495

Appendix T (Continued) Multiple Regression Summary Tables with Interactions Summary of Multiple Regression Analysis for Linguistic Explanations Variable b SE t p Intercept -9.0291 0.0 5.2142 -1.73 0.0847 Gender -7.7868 -0.5312 7.6662 -1.02 0.3108 FCAT Mathematics Score 0.0375 0.2220 0.0176 2.13* 0.0345 FCAT Reading Score 0.0296 0.1701 0.0193 1.54 0.1260 Gender*Mathematics -0.0109 -0.2430 0.0286 -0.38 0.7034 Gender*Reading 0.0366 0.8757 0.0283 1.29 0.1975 Note. N=237. b=regression coefficient, =standardized regression coefficient, SE=standard error, * indicates significance at =.05 level. 480

PAGE 496

Appendix T (Continued) Multiple Regression Summary Tables with Interactions Summary of Multiple Regression Analysis for Self-Efficacy Mean Bias Variable b SE t p Intercept 3.2043 0.0 0.4495 7.13* <.0001 Gender -0.6620 -0.5180 0.6609 -1.00 0.3176 FCAT Mathematics Score -0.0025 -0.1711 0.0015 -1.66 0.0988 FCAT Reading Score -0.0043 -0.2814 0.0017 -2.57* 0.0109 Gender*Mathematics -0.0000 -0.0100 0.0025 -0.02 0.9873 Gender*Reading 0.0010 0.2691 0.0024 0.40 0.6884 Note. N=237. b=regression coefficient, =standardized regression coefficient, SE=standard error, * indicates significance at =.05 level. 481

PAGE 497

Appendix T (Continued) Multiple Regression Summary Tables with Interactions Summary of Multiple Regression Analysis for Self-Assessment Mean Bias Variable b SE t p Intercept 3.3864 0.0 0.4092 8.28* <.0001 Gender -1.2221 -1.0481 0.6016 -2.03* 0.0434 FCAT Mathematics Score -0.0027 -0.2032 0.0014 -1.97* 0.0498 FCAT Reading Score -0.0044 -0.3151 0.0015 -2.88* 0.0043 Gender*Mathematics 0.0025 0.6855 0.0023 1.09 0.2769 Gender*Reading 0.0004 0.1297 0.0022 0.19 0.8464 Note. N=237. b=regression coefficient, =standardized regression coefficient, SE=standard error, * indicates significance at =.05 level. 482

PAGE 498

ABOUT THE AUTHOR Patricia Hunsader received her bac helors degree in Business from the University of South Florida, Sarasota in 1986. She retired fr om that career to raise a family of four boys and one girl. She returned to school at the University of South Florida to enter the Masters program in Elementary Education, and completed that degree in 2000. She t aught high school mathematics and fifthgrade while pursuing her masters degree, and remained in teaching until she entered the Ph. D. program in 2001. While in the Ph. D. program at t he University of South Florida, Mrs. Hunsader published six articles in state and national publications, gave presentations at several conferen ces and workshops, and taught undergraduate mathematics methods courses for pr e-service elementary teachers.