|USFDC Home||| RSS|
This item is only available as the following downloads:
1 of 30 A peer-reviewed scholarly journal Editor: Gene V Glass College of Education Arizona State University Copyright is retained by the first or sole author, who grants right of first publication to the EDUCATION POLICY ANALYSIS ARCHIVES EPAA is a project of the Education Policy Studies Laboratory. Articles appearing in EPAA are abstracted in the Current Index to Journals in Education by the ERIC Clearinghouse on Assessment and Evaluation and are permanently archived in Resources in Education Volume 11 Number 45December 3, 2003ISSN 1068-2341Portfolios, the Pied Piper of Teacher Certification Assessments: Legal and Psychometric Issues Judy R. Wilkerson William Steve Lang University of South Florida, St. PetersburgCitation: Wilkerson, J.R., & Lang, W.S. (2003, Dece mber 3). Portfolios, the Pied Piper of teacher certification assessments: Legal and psychometric i ssues Education Policy Analysis Archives, 11 (45). Retrieved [Date] from http://epaa.asu.edu/epa a/v11n45/.AbstractSince about 90% of schools, colleges, and departmen ts of education are currently using portfolios of one for m or another as decision-making tools for standards-based decisions regarding certification or licensure (as well as NCATE accred itation), it is appropriate to explore the legal and psychometric a spects of this assessment device. The authors demonstrate that por tfolios being used in a high-stakes context are technically testing devices and therefore need to meet psychometric standards o f validity, reliability, fairness, and absence of bias. These s tandards, along with federal law, form the cornerstone for legal ch allenges to high-stakes decisions when students are denied a di ploma or license based on the results of the assessment. The conclusion
2 of 30 includes a list of requirements and caveats for usi ng portfolios for graduation and certification decisions in a standar ds-based environment that help institutions reduce exposure to potential litigation.The Portfolio: Panacea or Pandoras BoxPortfolios, both paper and electronic, have become hot topics in standards-based performance assessment. Salzman, et al. (2002) report that almost 90% of schools, colleges, and departments of education (SCDEs) use portfolios to make decisions about candidates, and almost 40% do so as a certification or licensure requirement. In our own recent study of teacher preparation programs in Florida, we found that virt ually every institution in the State is using portfolios in some way to help make certification decisions (Wilkerson and Lang, 2003). In fact, portfolios see m to be viewed by many as the panacea of performance assessment. It is hard t o go to national meetings without being greeted by software professionals who have designed electronic portfolio products that they claim will help SCDEs meet state and national standards for accreditation and program approval. Y et, for many educators, the jury is still out. Some have not yet reached a conc lusion about whether or not to use portfolios for teacher certification. Others ar e reconsidering this decision having determined that the time involved for both f aculty and candidates is excessive. Hence, there is a need to clarify the is sues being raised nationwide. As teacher educators, we view the standards movemen t as an appropriate impetus for the continuing professionalism of teach ing when standards are used as a vehicle to redesign and improve teacher educat ion curriculum and licensure (Wilkerson, et al, 2003). They provide a vehicle for professionals to articulate what they believe is important, and this is probably why there are so many sets of standards. There are the Interstate Ne w Teacher Assessment and Support Consortium (INTASC), Specialty Professional Associations (SPAs) affiliated with NCATE, state program approval stand ards, state K-12 standards, institutional outcomes and conceptual frameworks. S tandards also provide a vehicle for college faculty to justify their curric ulum and a challenge to SCDEs to manage the assessment process. NCATE and many st ates require the use of multiple assessments to deal adequately with their complexity. As measurement professionals, we are frequently ask ed if portfolio assessment can be used as an appropriate and safe vehicle to m ake summative decisions in a certification context. Are they good measureme nt? Our answer is this: No, unless the contents are rigorously controlled and s ystematically evaluated. As Ingersoll and Scannell (2002) pointed out, portfoli os are not assessments, but are instead collections of candidate artifacts that present examples of what candidates can do. The contents need to be evaluate d individually as part of the candidates overall performance record using a data base format. Without proper attention to the psychometric requir ements of sound assessment, teacher educators may find themselves o n a slippery slope. SCDEs have to make sure that assessment devices ar e created and used properly, and that costs money. Otherwise, SCDEs m ay make bad decisions and face legal complaints that can have severe cons equences -expensive
3 of 30 trials and court imposed interventions not to men tion institutional reputation. For example, Floridas Department of Education has an extensive history of assessment challenges on their web site:http://www.firn.edu/doe/sas/hsaphome.htm.This does not mean that portfolios are bad or usele ss. They are excellent tools for reinforcing learning and for making formative d ecisions about candidate knowledge, skills, dispositions, and growth. Howeve r, when the decision is standards-based, summative, and results in initial certification, minimal competency must be established. Growth and learning are clearly important attributes of a quality teacher preparation program ; however, these are not the critical assessment issues in initial certification As important as they may be in determining if a certified teacher has achieved ma ster or accomplished teacher status, this decision is vastly different f rom the one made for initial certification. In licensure, the state must ensure first and foremost that the teacher is safe to enter the profession and will leave no child behind. Looking at this issue from a different viewpoint, i n medicine, society would not dream of allowing physicians to be licensed-based o n their own selection of showcased successes. We recognize that many critica l failures could be hidden behind their selected portfolio entries, and such f ailures could certainly prevent them from being safe practitioners. Medical licen sure requires the identification and systematic assessment of a solid set of skills. Pilots, too, must pass a series of carefully constructed performance tests. We do not want to fly on an airplane where we forgot to measure whether o r not the pilot could land the plane. Landing is part of minimal competence.In portfolio assessment systems that allow candidat es to choose their own artifacts, minimal competency with regard to standa rds is difficult to establish. There are too many test forms to establish either validity or reliability. When faculty fail to adequately align the artifacts sele cted by candidates with specific aspects of standards that define performance requir ements, the range of material may preclude adequate standards-based deci sions. When faculty fail to assess artifacts with solid, standards-based rub rics, it is difficult to interpret what their decisions mean and make appropriate infe rences about what they know, can do, and believe.Portfolio assessments, like all high-stakes tests, must stand the tests of validity, reliability, fairness, and absence of bias. If a ca ndidate is denied graduation or certification based on a portfolio assessment that is not psychometrically sound, the candidate could successfully challenge the inst itution (and the state department that approved the program and its assess ment system) in a court of law.This article has been written to clarify the above opinions, which we recognize to be controversial. The issues are complex, techni cal, and inter-related. A thorough understanding of these issues requires som ewhat detailed discussion of both the psychometric requirements of high-stake s testing and the legal requirements and decisions which are related to the m. These are inextricably linked. If psychometric properties do not exist, th e door to legal challenges from students is open. In considering the facts of the c ase, the courts then rely on
4 of 30 psychometric issues to make decisions regarding the infringement of students legal rights. We hope that readers of this article will be better prepared to decide whether or not to use portfolios as a summat ive assessment and, if so, how to construct the requirements.Since good teachers often attempt to make issues me aningful through some scenarios of what could happen, we will present a f ictitious case study of Mary Beth Joanne to introduce readers to the important p sychometric and legal issues discussed in this article. After our fictit ious case study, we discuss the roles of SCDEs and state departments of education (DOEs) in teacher certification, the rationale for defining portfolio s as a certification test, the legal challenges being posed with regard to certification and high-stakes testing, the psychometric issues affecting certification testing and portfolios, and the history of portfolios used as high-stakes tests. At the end we will provide some caveats about portfolios as high-stakes tests, and we will conclude with some suggestions about the use of portfolios in training and high-stakes testing.Mary Beth JoAnne Sues XYZ UniversityMary Beth Joanne, nicknamed MBJ, is a fictitious st udent who attends XZY University, which is located in Florida where teach er education programs must certify that their graduates have demonstrated all 12 of the Florida Educator Accomplished Practices (FEAPs). The FEAPs are ver y similar to the INTASC Principles. Florida has added two Practices, one on ethics and one on technology, which are embedded within the INTASC Pr inciples. XYZ University requires candidates to successfully complete an ele ctronic portfolio showcasing their work on the FEAPs. Here are the facts abou t MBJ and XYZ: Mary Beth JoAnne is 35 years old, is a single mothe r of three, works 20 hours a week at TarMart, and has typically enrolled in 15-18 credit hours per semester. She wants to get her teaching degree as quickly as possible so she can leave TarMart. She has the required GPA (with a 3.0), has passed the certification exam, and has successfully completed all requirements of the internship except the portfolio requirement. Mary Beth JoAnne meets with the program coordinator, Jack, to challenge the result, since she has been given a U (unsatisfactory) i n internship. The grade of U will prevent her from graduating and receivi ng her professional teaching certificate. Jack upholds his decision. Th ere is no further appeals process. XYZ candidates must have the required GPA, pass the State teacher certification exam, and successfully complete the p ortfolio and the final internship to graduate. If they successfully pass the states background check, they are awarded a five-year professional ce rtificate, renewable every five years thereafter. XYZs electronic portfolio includes 12 sections one for each FEAP. At least three to five pieces of evidence are required for each Practice. The same evidence may be used for multiple practices. T hese requirements are properly documented in the XYZ portfolio materi als, the catalog, and an advising sheet provided to students upon admissi on to the program. For each piece of evidence, candidates reflect on t heir work, linking it to the appropriate FEAP. The burden of proof, therefor e, begins with the
5 of 30 candidates; faculty either concur, or do not concur with the students reflection decisions, based on their re-evaluation of the work. Discussion about the FEAPs and strategies to write reflection are integrated into the curriculum.MBJ has attempted to complete the portfolio, but it was found to be unsatisfactory on two separate occasions in two s ections. She failed to demonstrate the States Practice on Critical Thinki ng (FEAP #4) because she was unable to provide any examples of elementar y student work showing that they had learned to think critically i n her classroom. She also failed to demonstrate the adequate use of technolog y (FEAP #12) in the portfolio itself. There are orientations for students at the beginnin g of each semester to train them in the creation of their portfolios. The requirements are distributed or re-distributed at that time. Faculty also trained on scoring the portfolios. Faculty advisors help candidates select their materials and sometimes provide candidates with the opportunity t o fix their errors. Course syllabi provide advice on evidence that may be used in the portfolio, linking tasks to standards. The portfolios are reviewed prior to internship and at the end of internship. XYZ uses a scoring rubric for the portfolios that a sks faculty to determine if the candidates have demonstrated each of the FEA Ps and selected indicators for those FEAPs. Inter-rater reliabilit y has been established. A fully equipped computer and materials lab is avai lable Monday through Friday, 8 am to 5 pm. The following are some scenarios invented to show w hat might happen if Mary Beth JoAnne decides to sue XYZ. Of course, there ar e many variables that remain unknown testimony and dispositions, expert ise and predispositions of lawyers and judges, etc. These scenarios are intend ed as food for thought. Scenario #1MBJ is Hispanic; her father is from Cuba, and her l ast name is Gonzalez. She files a claim under Titles VI and VII of the 19 64 Civil Rights Act. The results follow and are outlined in the steps used by the co urts in such cases: Step 1: XYZ analyzes the results of the portfolio e valuations, and a smaller percentage of Hispanics (70%) passed than n on-Hispanic Caucasians (95%). The court determines that there i s disparate impact on minorities (biased results) with this test. Step 2: The burden of proof shifts to XYZ. MBJ clai ms that the portfolio could not provide valid evidence of her potential t o perform in the classroom (i.e., to be certified). XYZ claims that the evidence is valid because the portfolio requirements were developed i n direct response to the States requirements, and it was organized arou nd the States FEAPs. The court finds as follows: The court upholds XYZ on the decision about critica l thinking, because the task is found to be job-related. The ju dges opinion notes that the State places a heavy emphasis on tea chers ability to impact K-12 learning, and this is documented in bot h State Statute
6 of 30 and State Board of Education Rule. The K-12 student s work is found to be one of the best measures of effective t eaching within an internship context. There is an appropriate relatio nship between the requirement and the purpose, thereby establishing s ome evidence of validity.The court finds that XYZ does not meet its burden o f proof, however, for several other reasons. The most significant of these is that XYZ cannot show the relationship between the creation o f a teaching portfolio and what teachers actually do in the clas sroom, thereby failing to establish adequate evidence of validity. While research indicates that portfolios are used as appropriate v ehicles for self-improvement and showcasing, and MBJ may eventu ally need to create a portfolio for national level certification through NBPTS, this is not a task she would do in her K-12 classroom to help children learn. In fact, many schools in Florida do not have computers. More important, the standard on technology requires that teachers use the technology within the context of instruction and th e management of instruction. Therefore, this test does not meet the standards of representativeness or relevance for the 12th Accomp lished Practice on Technology. It is not an authentic representatio n of the work to be performed by MBJ in the classroom and is, therefore not job-related. The business necessity requirement f or validity is not met. The court also finds that the entire portfolio is n ot valid because the use of three to five pieces of evidence has not bee n validated for representativeness or relevance, nor was there any attempt on the part of the institution to look at issues of propor tionality. Some evidence, and some practices, may be more important than others. Some may require more or less evidence to cover the depth and importance of the practice. Furthermore, XYZ has no procedures in place to ensure that the evidence selected by each candidate will meet the requirements of representativeness, releva nce, and proportionality (validity). The court finds that th e inconsistency in the specific contents of the portfolios makes the valid ation of the test virtually impossible. The court also finds that the institution has not u sed any research-based techniques to determine the cut-scor e on the portfolio evaluation that could be reasonably used to differentiate between the potentially competent and incompetent t eachers. There is no rational support for equally weighting the it ems used in each practice and there could be no such support since t he items vary from candidate to candidate. The court finds that instructional validity is also limited, since the preponderance of work on the portfolio was extra-cu rricular. MBJ did not have adequate opportunity to learn the skills n eeded to prepare a portfolio, and she was given inadequate opportuni ty to remediate. These are also issues related to fairness and due p rocess. The fact that she was able to document lack of support for, and experience in, the technological issues for building the portf olio adds weight to this claim. Finally, the court determines that it i s not reasonable to require MBJ to use university labs that are only av ailable during
7 of 30 weekdays when she is a working adult. This impedes her opportunity to learn and succeed.The court finds that the use of different pieces of evidence by different candidates makes it impossible for adequa te reliability studies to be conducted. Step 3: Not applicable, since MBJ prevails at Step 2. Step 3 addresses MBJs rights to alternatives, and it is addressed b elow in Scenario #2. Scenario #2 All of the contextual elements are the same; howev er, MBJ does not have very good lawyers. They do not make an effective case on all the aspects related to validity. Consequently, this time, XYZ prevails at Step 2. The trial moves to Step 3 and MBJ must prove that she was denied any reason able alternatives. Remember Jack? He did not offer her any alternative s. MBJ now asserts that she should have been allowed to substitute some oth er technology-based work, e.g., the use of lessons infused with technology an d the development of an electronic grade book.In this scenario, MBJ prevails again. XYZ is unable to show that the alternatives would be less effective than the original requireme nt. Scenario #3All of the contextual elements are the same as in S cenario 1; however, MBJ is non-Hispanic Caucasian. Although females are a prot ected class, she knows that the statistics would not support a discriminat ion claim under Titles VI and VII. She does, however, have a due process claim un der the 14th Amendment. She asserts that the bachelors degree in elementar y education is a property right of which she has been deprived without either substantive or procedural due process. The court finds the following: MBJs rights to substantive due process were abridg ed on the same issues of content validity as described in Scenario #1 and this is sufficient for her to prevail. The procedural due process claim introduces new pro blems for XYZ. The court finds in MBJs favor again on procedural due process because Jacks decision was not fair. MBJ was given no alte rnatives and no opportunities for an appeal. He just said no. XY Z also takes no precautions against cheating and has no written pol icies about the assistance that faculty and peers can provide. Ther efore, an unfair advantage is provided to some students who have mul tiple opportunities to revise their work and submit their portfolios, s tudy with faculty who know how to use the technology and enjoy it, and receive substantive assistance from others. The above scenarios do not address all of the thing s that can go wrong in a certification portfolio test. They do, however, pro vide some representative issues and results that may happen as the role of S CDEs continues to grow in the certification process. We will now address the contextual changes in
8 of 30 teacher certification that make the Mary Beth Joann e case relevant to many SCDEs.Contextual Changes: Shifting the Burden for Compete nce and CertificationBy the late 1990s, all states have adopted or seri ously considered increased curricular and/or testing standards for minimally c ompetent student performance in elementary and secondary schools (CCSSO, 1998). Public attention shifted to teacher competency, and a new teacher certificat ion testing movement arose in the South in the 1970s and 1980s. The movement eventually spread to the rest of the United States. Sireci and Green (2000) identified 45 states that now require prospective elementary and secondary teache rs to pass a teacher certification test as a prerequisite for employment Testing requirements came in a variety of forms, in cluding both paper and pencil tests and classroom observations for teacher s. In 1980, Georgia became the first state to require an on-the-job performanc e assessment for certification of beginning teachers. Georgia implemented a threepart assessment tool used in addition to a multiple choice certification test This tool included a portfolio of lesson plans, an interview to discuss the portfolio and an observation (McGinty, 1996). Florida, too, had an on-the-job performance observation system combined with a teacher certification test. This ob servation system was soon adopted or copied in other states, including Kentuc ky. It was called the Florida Performance Measurement System (FPMS).These state assessments, both performance-based and traditional tests, met with some quick and negative results. Among the sta tes challenged in court were Georgia, the Carolinas, Massachusetts, Texas, California, and Alabama. Despite the legal opposition, testing in one form o r another has survived. Pullin (2001) notes that one of the unusual aspects of teacher preparation has been that over the past fifty years, each of the st ates has delegated to public and private institutions of higher education much o f the responsibility for awarding teaching credentials. States control the p rocess of teacher education and certification through the state's mechanisms fo r approval of teacher education programs. Pullin (2001) asserts that once the state has approved the curriculum of an SCDE, program completion is tantam ount to being certified or licensed. Perhaps this shift in responsibility for certification is the direct result of the legal challenges faced by the States in the cer tification testing process. The suggestion here is that this delegation of responsi bility from one state or public agency to another (in the case of public institutio ns) includes a shift in psychometric responsibility and legal liability. As Lee and Owens (2001) note, one of the greatest challenges faced by teacher edu cation institutions today is to provide evidence that their candidates and gradu ates have demonstrated the knowledge, skills, and dispositions to support the learning and well-being of all students.In the case of Florida, where Mary Beth JoAnne resi des and attends XYZ University, the state has been characterized as one of five "bellwether" states in which new trends develop and as a "high change" sta te with a history of
9 of 30 reform. In this vein, it is not surprising that t he Florida Legislature provided the SCDEs with an enormous challenge that is represent ative of other states now or in the future. The program approval statute has as its intent the provision to SCDEs of the freedom to innovate while being held accountable (Chapter 1004, Florida Statutes. Thus, the responsibility fo r testing candidates for certification is shared in states like Florida by t he DOE and the SCDEs through state-administered exams and institutional assessme nts, both of which constitute a form of high-stakes testing. This has caused much consternation within the SCDEs in the State, as institutions wre stle with tough questions about what kinds of assessments they can use and ho w they can combine them into a decision leading to graduation and certifica tion. We have previously noted the following: In Florida, in teacher education, the State uses t he program approval process to hold institutions accountable. Florida is serious about this. Floridas first continued program appr oval standard requires that 100% of teacher candidates demonstra te each of the 12 Accomplished Practices. No wiggle room. This hi gh stakes requirement is causing institutions throughout the State to focus on how to operationalize the demonstration of competen cy for each of the Practices The State of Florida has said to teac her preparation programs, You must certify that your teacher candi dates have learned what we require, and you must tell us how y ou know they learned it. (Wilkerson, 2000, p. 2) Some readers may be reading this article from the p erspective of preparing for NCATE accreditation in a state that does not requir e the institution to participate in the licensure decision. It should be noted that there is a difference between meeting national accreditation standards that assur e the public of quality teacher preparation programs and issuing a certific ate or license to teach that assures the quality of teachers. The major premise of this article is that the legal and psychometric standards to be applied to the ass essments differ based on the requirements and mission of the agency to which the unit is responding. That does not mean that the unit assessment system cannot be the same, but the more stringent needs must prevail if the instit ution wants to be safe. NCATE allows SCDEs to work to establish the fairn ess, accuracy, and consistency of assessment procedures (NCATE, 200 0, p. 21) to meet Standard 2. If the SCDE is offering a diploma that leads to teacher certification or licensure, however, the standard to be applied i s significantly higher. The SCDE becomes both a test designer and test consumer We are using the word test . It is now appropriate to discuss the relationshi p of portfolios to testing.Portfolios as Certification TestsAccording to the definition of tests in the 1999 AERA/APA/NCME Standards for Educational and Psychological Testing forms of testing may include traditional multiple-choice tests, written essays, oral examinations, and more elaborate performance tasks. Hence, portfolios that are composed of written reflections (a form of an essay) and products repre sentative of the candidates skills, and performance, fall under a professionall y acceptable definition of
10 of 30 test. At another level, in her legal analysis of testing and certification issues, Pullin (2001), too, lumps together traditional test s and alternative assessments. Finally, the use of portfolios in high-stakes testi ng in states such as Georgia and Vermont lend further credibility to the classificat ion of portfolios as a test. Hence, even if one does not typically think about a portfolio as a test, the classification of portfolios as a test is appropria te. Since there are many perspectives of what a portfol io is or should be, a working definition for this article is needed. This article will use the one from Herman, et al (1992) that describes portfolio assessment as a strategy for creating a classroom assessment system that includes multiple measures taken over time. Portfolios have the advantage of containing several samples of student work assembled in a purposeful manner. Well-conceived po rtfolios include pieces representing both work in progress and showpiece samples, student reflection about their work, and evaluation criteria. (p. 120 ). A decision about whether or not someone is allowed to enter into, or remain in, a profession or occupation is what is commonly call ed a high-stakes decision. Mehrens and Popham (1992) define high-stakes tests as tests used for decisions, such as for employment, licensure, or a high school graduation. They warn that when tests are used for high-stakes decis ions, they will be subject to legal scrutiny. There is a strong possibility that individuals for whom an unfavorable decision is made will bring a legal sui t against the developer and/or user of the test. They go on to note, however, that existing case law suggests that if tests are constructed and used according to existing standards, they should withstand that scrutiny.Given the definition of a portfolio as a high-stake s test that serves, at least in part, to make a certification, licensure, or gradua tion decision, legal and psychometric issues apply. This is also true of any assessment device used in such decisions, regardless of whether it is authore d by a test company, a state agency, or an SCDE.Herman, et al (1992) follow their definition of a p ortfolio with some concerns (from Arter and Spandel, 1992) that should be kept in mind when using portfolios or other comprehensive assessment system s. These concerns serve as a useful introduction to the more technically st ated issues to be raised in this article. The six concerns are (p. 200): How representative is the work included in the port folio of what students really can do? 1. Do the portfolio pieces represent coached work? Ind ependent work? Group work? Are they identified as to the amount of support students receive? 2. Do the evaluation criteria for each piece and the p ortfolio as a whole represent the most relevant or useful dimensions of student work? 3. How well do portfolio pieces match important instru ctional targets or authentic tasks? 4. Do tasks or some parts of them require extraneous a bilities? 5. Is there a method for ensuring that portfolios are reviewed consistently and criteria applied accurately? 6.
11 of 30 Psychometric Issues and Legal ChallengesWe have raised the specter of legal challenges, and it is time to address the challenges that can be faced in any certification t est, be it large-scale or small-scale, state-administered or institutionally designed and administered. Legal challenges are based upon the convergence of federal law and psychometric properties. It is difficult, if not im possible, to separate the two. A review of the research written about legal challe nges indicates that there are four basic legal issues: two challenges under the 1 964 Civil Rights Act (Title VI and Title VII) and two challenges under the Fourtee nth Amendment to the United States Constitution (due process and equal p rotection). Title VI supplements Title VII by reinforcing the prohibitio n against discrimination in programs or activities that receive federal funding which includes most SCDEs through grants and financial aid. (Sireci & Green 2 000; Pullin, 2001; Mehrens & Popham, 1996; McDonough & Wolf, 1987; Pascoe & Halp in, 2001). Precedent setting cases come from a variety of empl oyment situations, both within and outside the field of education. Many cha llenges introduce psychometric issues, the chief of which is validity The applicable guidelines and standards governing the psychometric properties of the test and the decisions made using the test, whether it be in the field of education or not, are based in educational psychology and measurement as well as e mployment guidelines. The two most influential resources that provide ope rational direction for these legal decisions are the 1999 AERA/APA/NCME Standards for Educational and Psychological Testing and the 1978 Uniform Guidelines on Employee Selection Procedures (Pascoe & Halpin, 2001). Regarding the Civil Rights Act of 1964, Titles VI a nd VII forbid not only intentional discrimination on the basis of race, co lor, or national original, but also practices that have a disparate impact on a pr otected class. Courts use a three-step process, in which the burden of proof sh ifts back and forth from the plaintiff to the defendant. We used these three ste ps in our analysis of the Mary Beth JoAnne case. In the first step, the plaintiff must prove discrimination. The discrimination could either be intended or coincide ntal, but it is clearly the responsibility of the institution to ensure that un intended discrimination (disparate impact) does not occur. This is why the results changed from scenario to scenario, dependent on MBJs ethnic bac kground. She was a member of a minority group that was less successful than the majority population in the first scenario.If discrimination has occurred, the defendant (SCDE ) must demonstrate that the test was valid and is necessary, and this is most o ften linked to the job-relatedness (or the business necessity) of th e test. It is in this second step, where the legal and psychometric issues converge (S cenario #1 of MBJ). If the defendant proves in court that the test is valid, t he plaintiff has one more chance to prevail. If he/she can prove that the def endant could have used an alternative test with equivalent results, the defen dant will lose (Scenario #2). There are two basic requirements in the U.S. Consti tution's 14th Amendment
12 of 30 that apply to this context: equal protection and du e process. For a plaintiff to win under the equal protection claim, it must be shown that there was intent to discriminate. This is difficult and, therefore, rar ely used. The due process provisions, however, have become relatively common. They forbid a governmental entity from depriving a person of a pr operty or liberty interest without due process of law. (The Debra P. v. Turlin gton case established the diploma as a property right.) There are two kinds o f these claims: substantive and procedural due process. Substantive due process requires a legitimate relationship between a requirement and the purpose. This is much easier to establish than the business necessity requirement o f the Civil Rights Act. Procedural due process requires fairness in the wa y things are done, and these include advance notice of the requirement, an oppor tunity for hearings/appeals, and the conduct of fair hearings. Psychometric prop erties are excluded from this claim. MBJ prevailed on both types of due process i n Scenario #3. (Mehrens & Popham, 1992; Sireci & Green 2000).Thus, the linkage between legal rights and psychome tric properties can occur in two places, opening the Pandoras box of validity a nd reliability. First it can occur within the context of step two of a discrimin ation claim under Titles VI and/or VII of the Civil Rights Act where there is i ntended discrimination or disparate impact on a protected class. Second, it c an occur within the context of a lack of a legitimate relationship between a requi rement (e.g., a test) and a purpose (e.g., protecting the public from unsafe te achers) that constitutes a violation of substantive due process rights as assu red by the Fourteenth Amendment of the U.S. Constitution.There are other potential legal challenges as well, but they are beyond the scope of this article. Worth mentioning in passing, however, is the potential for challenges by faculty who are asked to conduct exte nsive work, without remuneration, outside of their regularly assigned c ourse-based teaching assignments (Sandmann, 1998). This is, of course, p articularly problematic with portfolios completed and reviewed outside of the re gular course teaching/assessing process.Can It Happen At My Institution?While most of the precedents discussed in the liter ature refer to states and traditional teacher certification tests, institutio ns have been challenged on the quality of educational opportunities received in th eir program. They have successfully used contract and negligence law theor ies in asserting institutional failures to provide the educational services they f elt they should have received. (Mellnick & Pullin, 2000) Now that institutions hav e received part of the burden of certification testing, this risk is increased an d can readily be combined with the challenges previously encountered at the state level. While courts generally hold that the policy of requ iring successful performance on a teacher test is reasonable public policy, they scrutinize the tests and the test administration quite carefully. This scrutiny includes validity, reliability, and fairness. Even if a test is an appropriate measure of the knowledge and skills needed by teachers, it may not be a legal test if t he cut score itself is not a valid indicator of teacher competence, set using professi onal standards (Mellnick &
13 of 30 Pullin, 2000). Since psychometric issues are so cri tical in preparing and administering a certification-related test, a discu ssion of the psychometric issues follows. The primary source of the discussio n is based on the requirements established in the Standards for Educational and Psychological Testing (APA, AERA, NCME, 1999), which, along with the EEO C Guidelines (1978) is used consistently as the standard in le gal disputes. It is important for faculty designing and implement ing tests used in the graduation/certification decision to understand and apply these requirements. Selected issues are described below, and they are p articularly targeted at the use of portfolios in high-stakes decisions.Psychometric IssuesTest DesignIssues related to test design are addressed in Sect ion 3 of the AERA/APA/NCME Standards (1999). A critical element noted in the Standards is the need to carefully specify the content of the test in a framework. This framework is sometimes called a table of specificat ions or, in the case of traditional tests, a test map or blueprint. The Sta ndards provide specific guidance with regard to performance assessments in general and portfolios in particular. Performance assessments are defined in this section as those assessments that require the test takers to demons trate their abilities or skills in settings that closely resemble real-life setting s (p. 41). They may be either product-based or behavior-based.The Standards note that performance assessments typ ically consist of a small number of tasks that establish the extent to which the results can be generalized to the broader domain. The use of test specifications contributes to a systematic development of tasks and helps to ensu re that the critical dimensions of the domain are assessed, leading to a more comprehensive coverage of the domain than is typically achieved w ithout the use of specifications. The Standards also suggest that bot h logical and empirical evidence be gathered to document the extent to whic h the assessment tasks and scoring criteria reflect the processes or skill s specified in the domain. With regard to portfolios, the Standards define por tfolios as systematic collections of work gathered for a specific purpose They note that those who assemble the portfolios may select their own work, if that is appropriate to the purpose. However, the following caution is provided : The more standardized the contents and procedures of administration, the easier it is to establish comparability of portfolio-based scores. Regardless of the methods used, all performance assessments are evaluated by the same s tandards of technical quality as other forms of tests. (p. 42).Validity, Sampling, and Job-RelatednessSection 14 of the AERA/APA/NCME Standards (1999) ou tlines the requirements for testing in employment and credenti aling, focusing on the
14 of 30 applicants current skill or competence, including entry into a profession and ranging from novice to expert in a given field. It is, therefore, one of the most relevant chapters in the Standards.The Standards explain that licensing and certificat ion requirements are imposed by state and local governments to ensure that those licensed or certified possess essential knowledge and skills in sufficien t degree to perform their work safely and effectively, thereby protecting the publ ic from non-qualified personnel. Tests used for this purpose are intended to provide the public with a dependable mechanism for identifying practitioners who have met particular job-related standards. Standard 14.14 requires that the content domain to be covered by a credentialing test should be defined c learly and justified in terms of the importance of the content for credential-wor thy performance in an occupation or profession (AERA/APA/NCME, 1999). Thi s is the basis for making the link in substantive due process claims b etween the requirement and the purpose, with the purpose referring to the Stat es role in certification to protect the public as delegated to SCDEs, and the r equirement referring to the test including portfolios.The content domain to be covered by a licensure or certification test should be defined clearly and explained in terms of the impor tance of the content for competent performance in an occupation (AERA/APA/NC ME, 1999). The creation of the test requires that the author devel op and implement a content sampling process. Construct irrelevant variances re fers to the degree to which test scores are affected by processes that are extr aneous to the intended construct. Construct under-representation refers to the degree to which a test fails to capture important aspects of the construct It implies a narrowed meaning of test scores because the test does not ad equately sample some types of content (AERA/APA/NCME, 1999).A content validation examination can be conducted t o determine if a representative sample of the domain of skills neede d to perform the job is covered adequately -often referred to as job-rela tedness. To content validate a test, the test writers would examine all elements o f the test and try to ascertain how well the test covered the essential areas of kn owledge and skill. The extent to which the content is underrepresented or irrelev ant becomes a critical concern. Proportionality of the items is another i mportant issue. In order to meet the criteria of representativeness and proport ionality, the test must reflect the entire breadth of the domain and it must place the greatest emphasis on the most significant aspects within the domain. A test that sampled knowledge or behavior from part of a domain would not be represe ntative. A test that put great weight on insignificant or marginally related aspects of a domain would be disproportionate. In recent years, these issues hav e been of major concern in determining the legal defensibility of employment, licensing, and certification tests (McDonough & Wolf, 1987; Sireci & Green 2000) Thus both sufficiency and relevancy are critical issues in test construct ion and were critical issues in the MBJ case, Scenario #1.AERA/APA/NCME Standard 14.4 requires that all crite ria used should represent important work behaviors or work outputs, on the jo b or in job-relevant training, as indicated by an appropriate review of informatio n about the job. Standard
15 of 30 14.9 requires that when evidence of validity based on test content is a primary source of validity evidence in support of the use o f a test in selection or promotion, a close link between test content and jo b content should be demonstrated. The rational relationship between wha t is measured on a certification test and what practitioners actually do on the job is usually established by conducting a thorough practice (or j ob) analysis. The practice analysis can be thought of as a very detailed job d escription, breaking down a profession into performance domains that are charac terized by specific tasks. The tasks are further delineated into knowledge and skill statements that represent the essential qualities needed to perform each task (Sireci & Green 2000; Pullin, 2001; AERA/APA/NCME, 1999).When a test attempts to sample a work behavior or t o review a sample work product, then these should approximate the real-wor ld work setting as much as possible (EEOC, 1978). Not all aspects of job perfo rmance need to be tested, but generally a test should be fairly representativ e of the job in question and courts may look more closely at tests which sample only a small part of the total job. ADA requires selection decisions be based upon the "essential functions" of a job (Pullin, 2001).Although psychometricians and courts now have a dis parity of opinion about what validity is, content validity remains the prim ary evidence used by courts when making decisions about fairness (Pascoe & Halp in, 2001). Job relevance has been an important issue in many of the employme nt test cases of the past twenty years. The valid use of the test is based on a clear understanding of the rational relationship between the test and the know ledge, skills, and abilities required to do the job (McDonough & Wolf, 1987). Th e job-relatedness standard was a major factor in the court finding for Mary Be th Joanne on the technology issue.Lee and Owens (2001) asserted that most educational institutions and training and development companies do not conduct validity s tudies for two reasons -lack of skill in conducting these studies and fear of spending the money it takes. They concluded that if only those companies who had been sued for unfair business practices had considered the alternative c osts of defending themselves in court, they might have decided to lea rn how and conduct the needed studies.Reliability and Measurement ErrorReliability refers to the consistency of measuremen ts when testing procedures are repeated. It is assumed that there is a degree of stability in scores, but there also needs to be an accounting for measurement erro r, or score variability that is not related to the purposes of the measurement. Measurement error can come from differences in the difficulty of differen t test forms (e.g., different work samples in different students portfolios); fluctua tions in motivation, interest, or attention; intervention, learning, or maturation (e .g., uneven help in completing tasks and assembling portfolios).The APA/AERA/NCME Standards (1999) specifically add ress the recent development of performance assessments large-scale testing and portfolios in
16 of 30 particular, especially those in which examinees sel ect their own work or work cooperatively in completing the test. They note tha t, Examinations of this kind raise complex issues regarding the domain represent ed by the test and about the generalizability of individual and group scores Each step toward greater flexibility almost inevitably enlarges the scope an d magnitude of measurement error. (p. 26) This was the case at XYZ Universit y. The Standards indicate that information about measu rement error is essential, whether the test is of a traditional nature or is a portfolio of work samples, or other forms of performance assessment techniques. No test developer is exempt from this responsibility (p. 27). Critical information to be obtained includes the sources of measurement error; summary statistics on the size of such errors; and the degree of generalizability acr oss alternate scores, forms, and administrations. Where there is significant sub jectivity, indexes of scorer consistency, often called inter-rated reliability, are also common. It should be clear from the above that there are mu ltiple issues related to reliability that need to be studied. Inter-rater re liability is but one of these issues. Many factors can contribute to error including rate r training, rater mood or fatigue, unclear directions, number of items, varia tions in the types or difficulty of evidence evaluated (e.g., student selected evide nce in portfolios), unequal assistance provided to candidates, cheating (those portfolios that are being sold or distributed on campus or on the Internet), and o ther such factors. Institutions that rely almost exclusively on an inter-rater reli ability study to handle the psychometric requirements are in jeopardy. There ma y also be many other sources of error that go undetected, especially if faculty evaluators are just plain tired from reading so many portfolios or angry that they are being forced to do it just for accreditation purposes. Combined with the potential for a lack of validity if the evidence provided is either construct irrele vant or underrepresented, it may be that those high inter-rater reliability scor es only indicate that raters who are tired are consistently rating highly (halo effe ct) the wrong stuff just to get finished.Cut-ScoresAs an SCDE or a DOE develops its tests, faculty mus t ask whether or not the content measured is relevant to making the decision about minimal competence, and the potential for adequate performa nce on the job. The portfolio, or any other assessment device, in this context, is a qualifications test, targeted at sorting those who should be allowed to teach from those who should not, based on what they will be expected to do on t he job. Designing the testing program includes deciding wha t areas are to be covered, whether one or a series of tests is to be used, and how multiple test scores are to be combined to reach an overall decision about w hether or not the examinee is likely to engage in safe and appropriate practic e. It is not only the internal aspects of the test that must be judged valid, but also the way in which the test is used to identify masters and non-masters or succ esses and failures. Defining the minimum level of knowledge and skill required f or licensure or certification is one of the most important and difficult tasks facin g those responsible for credentialing. This is accomplished by identifying and verifying a cut score or
17 of 30 scores on the tests and is a critical element in va lidity. The cut score must be high enough to protect the public, as well as the p ractitioner, but not so high as to be unreasonably limiting. AERA/APA/NCME Standard 14.13 requires that when dec ision makers integrate information from multiple tests or integr ate test and non-test information, the role played by each test in the de cision process should be clearly explicated, and the use of each test or tes t composite should be supported by validity evidence. In some cases, an a cceptable performance level is required on each test in an examination series. Standard setting procedures (e.g., Angoff) are designed to determine passing sc ores that distinguish those worthy of a credential from those who are not (AERA /APA/NCME, 1999; McDonough & Wolf, 1987; Sireci & Green 2000; Kane, 1994). Section IV of the Standards also provides guidance on this issue. In the case of licensure or certification, the cut score should re present an informed judgment that those scoring below it are likely to make seri ous errors because of their lack of knowledge or skills. The most difficult part is weighing the relative probabilities of false positives (keeping good cand idates out of the profession) and false negatives (letting poor candidates into t he profession). Because this is largely a value-laden and subjective procedure, the qualifications of the judges used in standard setting are extremely important.FairnessThe APA/AERA/NCME Standards (1999) outline four bas ic views of fairness: Three will be discussed because of their relevance to this article: lack of bias, equitable treatment in the testing process, and opp ortunity to learn. Bias can occur when there is evidence that scores a re different for identifiable subgroups of the population tested. Bias is determi ned by the response patterns for these groups. If a protected populatio n (e.g., minorities, women, or handicapped) performs worse than the majority popul ation, bias is an issue (MBJ Scenario #1). Bias may also occur as a result of the content of the test itself. The language of the material may be emotion ally disturbing or offensive or may require knowledge more common to a specific group of examinees. Bias can also occur with a lack of clarity in instructio ns or scoring rubrics that credit responses more typical of one group than another. A nother form of bias relates to the responses provided by the examinees. For exa mple, if the examinees answer the way they think the scorers want, bias is an issue. A portfolio reflection reviewed for dispositions toward teachin g, for example, could be filled with what the candidate thinks the professors want to see rather than what the candidate really believes.Equitable treatment refers to the manner in which t he test is administered. All examinees need to have comparable opportunities to demonstrate their ability, and this includes testing conditions, familiarity w ith format, practice materials, etc. Opportunities to succeed must be comparable. T here must be equity in the resources available, and all examinees need to have meaningful opportunities to provide input to decision makers about procedura l irregularities. In the case of portfolios, if one candidate has more opportunit ies than another to succeed
18 of 30 or to challenge, based on the support provided by f aculty, fairness becomes an issue. These, too were issues for MBJ in Scenario # 3. Opportunity to learn requires that the institution assure that what is to be tested is fully included in the specification of what is t o be taught. In the case of portfolios, then, where reflections are written aft er instruction is completed and are a critical component of the scoring, institutio ns would need to ensure that candidates had had adequate opportunities to learn how to self-assess at the level expected in the portfolio. Candidates would a lso have to have had adequate opportunities to produce sufficient materi als in class to provide evidence of standards demonstration. Candidates als o would need adequate opportunities to fix problems.Legal Issues and PrecedentsIt is difficult to remain informed about current le gal practice with regard to professional licensure, but it is important (Pascoe & Halpin, 2001). The courts have granted governmental authorities wide latitude as long as they have taken reasonable steps to validate the tests and the cuto ff scores. Whether the plaintiffs are minorities or members of the majorit y population, other steps that the courts have considered include (1) providing am ple prior notice before implementation of the high stakes phase; (2) allowi ng accommodations in the administration of the tests for the disabled; and ( 3) allowing retesting and, to the extent feasible, remediation (Zirkel, 2000).Courts recently have been supportive of performance measures (Lee & Owens, 2001; Rebell, 1991). As far as teacher educators ar e concerned, Pullin (2001) notes that the courts have been generally reluctant to second-guess educators' judgments of educational performance based on subje ctive evaluations. This is of some comfort to the teacher education community. She goes on to say that in situations in which the individual stakes are no t as high, such as during an educator preparation program or during a probationa ry period of employment, then fewer procedural protections are required. If the decision-making seems to be based upon the purely evaluative judgments of qu alified professionals, courts may be reluctant to intervene. On the other hand, how can we be sure? Lemke (2001), too, offers an opinion. She reviewed court decisions concerning the dismissal of college students from professional programs and determined that courts upheld school decisions when the instit ution followed its own published processes and the students' rights had be en observed. This, too, provides for a high degree of comfort. If students are told what is expected of them in clear terms, colleges are safer. But Lemke also found that there is a lack of information about what the judicial system finds to be appropriate and inappropriate admissions and dismissal procedures. She looked at the decision of Connelley v. University of Vermont (1965), in wh ich the federal district court ruled that it is within the purview of academic fre edom for faculty to make decisions about students' progress. Faculty and adm inistrators were described as uniquely qualified to make these decisions. In t hose days, though, certification was still the purview of the state. L emke also reviewed eight cases of students filing against institutions. In these c ases, the institutions had the right to make decisions about a student's academic fitness as long as it followed
19 of 30 its advertised processes. Reasons for dismissals th at were upheld included the use of subjective assessments in clinical experienc es, time requirements for program completion, comparison of test scores betwe en the plaintiff and peers, GPA, and absenteeism.Educators in Florida, though, have seen that the PK -20 system is not so safe. The groundbreaking Debra P. v. Turlington case (197 9, 1981, 1983, 1984) begins to reduce the level of comfort engendered in the previous two citations. This was a diploma sanction case, bringing educator s back to the issue of content validity. It is generally conceded that a s tate has the constitutional right to use a competency test for decisions regarding gr aduation. A diploma is considered a property right, and one must show some evidence of curricular/instructional validity or what is also c alled "opportunity to learn" or "adequacy of preparation." In this case both due pr ocess and the equal protection clauses of the 14th Amendment were found to be violated by Florida officials who were using a basic skills test for di ploma denial at the high school level. In appeals, additional issues were raised ab out whether the test covered material that was adequately covered in Florida's c lassrooms, and this has become the major precedent for looking at "instruct ional or curricular" validity. The judge ruled that, "What is required is that the skills be included in the official curriculum and that the majority of teachers recogn ize them as being something they should teach." (Mehrens and Popham, 1992) In the MBJ case, XYZ required candidates to prepare their portfolios out side of their regular courses, thereby increasing their risk of challenge based on the principle of opportunity to learn.The continuing shift in responsibility to SCDEs fro m DOEs for more and more of the burden of making certification decisions can ea sily result in successful claims by unhappy students who are denied their car eer dreams. The diploma denial challenges, combined with the challenges bas ed on denial of a teaching certificate by a state agency provides for a natura l leap to challenge diploma/certificate denial from an SCDE.McDonough and Wolf (1988) identified five issues ar ound which litigation against educational testing programs occurs: (1) th e arbitrary and capricious development or implementation of a test or employee selection procedure, (2) the statistical and conceptual validity of a test o r procedure, (3) the adverse or disproportionate impact of a testing program or sel ection procedure on a "protected group", (4) the relevance of a test or p rocedure to the identified requirements of the job (job-relatedness), and (5) the use of tests of selection procedures to violate an individual's or group's ci vil rights (McDonough & Wolf, 1987).Courts have generally required evidence that the cu t-score selected for a test be shown to be related to job-performance. In the A labama case against National Evaluation Systems (NES), the test develop ers, the court found that the company engaged in practices "outside the realm of professionalism" and that it violated the minimum professional requireme nts for test development. Among the problems found were decisions in test dev elopment that resulted in test scores that were arbitrary and capricious and bore no rational relationship to teacher competence. There was a similar finding in Massachusetts against
20 of 30 the same company. In Groves v. Alabama Board of Edu cation, the court found in 1991 that the arbitrary selection of a cut-score without logical or significant relationship to minimal competence as teacher had n o rational basis nor professional justification. As such, it failed to m eet the requirements of Title VI and was not a good faith exercise of the profession al judgment. Evidence should be available that the cut-score for a test d oes not eliminate good teachers from eligibility for teaching jobs (Pullin 2001). The California Basic Education Skills Test (CBEST) was challenged in 1983 under Tile VII by the Association of Mexican-Americ an Educators. The State won the case based on a job-relatedness study (Zirk el, 2000). In 1984, Florida lost a challenge to FPMS when the question of the v alidity of the decision about a teachers certificate removal was successfully ra ised (Hazi, 1989). Georgias TPAI challenge was won by the plaintiff based on du e process and validity challenges (McGinty, 1996). The U.S. Department of Justice sued the State of North Carolina in 1975 under Title VII based on res ults on the National Teacher Examination from the Educational Testing Service. T hey won the claim when the court found the test to be unfair and discrimin atory because a validation study had not been conducted and the passing score was arbitrary, thereby denying equal protection. A second similar claim wa s filed against the State of South Carolina, but in this instance the state prev ailed based on a proper validation study causing the test to be deemed fair and appropriate (Pascoe & Halpin, 2001). There are many such discussions in t he literature. The point is that tests, even those written by major test publis hers, can be successfully challenged.What History Tells Us About Using Portfolios as Hig h-Stakes TestsBefore proceeding any further, it is important to u nderscore that the authors are not opposed to portfolios in a general sense. This article is about portfolios used in a certification testing context, particularly wh en there is a high degree of flexibility allowed to students in the selection of portfolio contents. There is much in the literature to support the use of portfo lios as a tool for learning, particularly the reflective or self-assessment aspe ct. As noted earlier, they are excellent means for documenting growth, improving i nstruction and learning, and causing students of any age to construct meanin g and value their own progress at meeting important instructional goals. For formative assessment, they can be superior assessments. For example, when Vermont implemented its K-12 statewide portfolio assessment system in 1 988 as the first attempt in the U.S. to use portfolio assessment as a cornersto ne of a statewide assessment, the results in these areas were clear a nd strong. The studies by the RAND Corporation and the Center for Research on Evaluation, Standards, and Student Testing -CRESST (Koretz, 1994) clearl y indicated that teachers thought that portfolios were helpful as informal cl assroom assessment tools but that they, too, were worried about their use for ex ternal assessment purposes. The majority of teachers surveyed agreed or strongl y agreed that portfolios help students monitor their own progress. However, the v ast majority did not believe it would be fair to evaluate students on the basis of their portfolio scores. Most felt that the state's emphasis on reliable scoring was misguided and perverted
21 of 30 the original purpose of portfolios as a tool for as sessing an individual student's growth. Teachers were concerned about the validity of portfolios as an assessment instrument, particularly because of the large number of uncontrolled variables and the time burden both in class and outside of class. They felt they spent too much time managing and sco ring portfolios and this detracted from their time to teach (Koretz, 1994).In a subsequent study by Gearhart and Herman (1995) further support was given for the significant benefits for instructiona l reform being witnessed in Vermont, but the challenges were reinforced with th e question about whose work was being judged when the work was composed wi th the support of peers, teachers, and others. They noted that to many commi tted to educational reform, portfolio assessment embodies a vision of assessmen t integrated with instruction. Advocates find that portfolios provide a richer and truer picture of students' competencies than do traditional or other performance-based assessments by challenging teachers and students to focus on meaningful outcomes. Integrated with instruction and targeted at high standards, the portfolio is seen by its advocates as the bridge be tween improved teaching and accountability. However, while the vision is entici ng, Gearhart and Herman (1995) asked if it would work. The RAND study raise d major issues about reliability; this study brought into question the v alidity of inferences drawn when the assessment results are compromised by questions about authorship and support. They concluded that from a measurement per spective, the validity of inferences about student competence based solely on portfolio work appeared suspect. The problem is troubling indeed for large scale assessment purposes where comparability of data is an issue.Questions about using portfolios in high-stakes ass essments have also been raised in the teacher certification arena. The Geor gia Teacher Performance Assessment Instrument (TPAI), initiated in 1980, in cluded a portfolio component and an observational component as an interview. The TPAI was initially successfully challenged by a teacher (Kitchens) for the validity of its observational component, which was found to include behaviors that were difficult to measure (e.g., enthusiasm). However, i n the aftermath of the Kitchens case, the opposition to TPAI that grew was around the portfolio process, which was again found to be far too time c onsuming for a beginning teacher and not a valid measure of teacher performa nce because the portfolios were being judged on the basis of form rather than substance. The $5,000,000 "mammoth measurement tool" was laid to rest (McGint y, 1996). At the institutional level, after the Alabama decis ion to terminate state testing because of racial bias, Nweke and Noland (1996) inv estigated the effectiveness of using performance and portfolio assessment techn iques to diversify assessment in a minority teacher education program at Tuskegee University. They concluded that the observational component cor related highly with GPA but there were no statistically significant relatio nships between portfolios and GPA or portfolios and the performance assessment.This is a representative, not an exhaustive, study of the use of portfolios in large and small scale assessments. Despite findings such as these, though, portfolios continue to be a major component of teacher assessm ent systems in SCDEs.
22 of 30 AACTE (American Association of Colleges of Teacher Education) conducted a survey of member institutions in fall 2001 (Salzman 2002) on teacher education outcomes measures. The purpose of the study was to identify and describe what SCDEs are doing to meet the requirements for o utcomes assessment for unit accreditation and program approval (teacher ce rtification). They concluded that institutions are responding to more rigorous s tandards and to national and state mandates for accountability through multiple types of outcome measures, including portfolios. Results from the 370 respondi ng institutions indicated that portfolios are used as an outcome measure by 319 (8 7.9%) of the responding institutions. Responses further indicated that 64 ( 20.1% of the institutions) do so in response to a state mandate while 269 (84.3%) do so as part of an institutional mandate. Portfolios were noted as req uired for certification or licensure by 123 (38.6%) institutions and not requi red by 159 (49.8%) for licensure. Data were missing from 37 (11.6%) of the respondents. Most units (305 or 95.6%) reported that the portfolio requirem ents were developed by the unit (Salzman, et al., 2002).There is a school of thought that advocates strongl y for portfolios. With Portfolio in Hand a recent work edited by Nona Lyons (1998), contai ns several important chapters advocating for portfolios. Even in these c hapters, the caveats exist. For example, although Moss proposes that validity i ssues related to assessment of teaching be rethought to allow for the benefits of portfolio assessment, she concludes with suggestions from classical theory. O n the one hand, she proposes an integrative or hermeneutic approach to portfolio assessment in which raters engage in a dialogue to reach consensu s about ratings, but she acknowledges that this is a time-consuming approach for which substantial empirical work is needed to explore both the possib ilities and limitations. Even with this proposed new approach, Moss acknowledges the need to ensure the relevance, representativeness, and/or criticality o f the performances and criteria as well as job-relatedness, social consequence stud ies, lack of bias, reliability, and most other aspects of psychometrics. Dollase (1 998), while advocating for portfolios in teacher certification also acknowledg es the severity of the issue of time in terms of the doability of the approachRequirements and Caveats Regarding the Use of Portf olios as Certification TestsTo this point, arguments have been made that that t he problems associated with portfolios in a high-stakes testing environmen t center around validity, reliability, fairness, excessive time burdens, and loss of the meaning and value of portfolios as a viable means to improve learning Based on this analysis of the literature, the autho rs have identified eight requirements for the construction of portfolios as tests used for certification in an SCDE. These will be accompanied by some caveats related to the use of portfolios for SCDE-based certification decisions a dded. They are listed in Table 1. Table 1. Requirements and Caveats for Portfolio Use in Certification Testing in an SCDE
23 of 30 #Requirement for Tests Caveats for Portfolios 1The knowledge and skills to be demonstrated inthe portfolio/test must be essential in nature. They mustrepresent important work behaviors that are job-related and be authenticrepresentations of what teachers do in the real world ofwork. If the portfolio is used as a test itself containin g new or original work created outside of courses, rather th an just a container of evidence of course-embedded tas ks, the portfolio must stand the test that it is job-re lated and authentic. The SCDE should be prepared to defend how portfolio preparation as a stand-alone activity is a critical job function that teachers perform on a ro utine basis, similar to lesson planning, communication wi th students and parents, assessment, teaching criticalthinking skills, etc. In the case of electronic por tfolios, if the product is used to demonstrate a standard orexpectation on technology that relates to using technology in the classroom, the SCDE will need tojustify that that the preparation of the portfolio is equivalent to what teachers do with technology in t he classroom. This may be difficult from an authentici ty perspective. 2The entire portfolio/test (assessment system) must meetthe criteria of representativeness, relevance, andproportionality If the portfolio is a container of evidence used as a summative assessment for the certification/graduati on decision, the SCDE must be prepared to defend the contents of portfolios submitted by all candidates for the representativeness, relevance and proportionality o f contents against the requirements of the teachingprofession, e.g., the standards being assessed fromnational and state agencies as well as the institut ion itself (conceptual framework). If the portfolio is a specific piece of evidence itself, then its place w ithin the assessment system must be included in the analysis of representativeness, relevance, and proportionality. All criteria used to evaluate the portfolio must be rel evant to the job. Criteria such as neatness and organizat ion are particularly suspect, unless they can be direct ly tied to the potential for poor performance in the classr oom. The SCDE will need to prove that sloppy ordisorganized teachers cannot be effective teachers. 3There must be adequate procedures and written documentsused to provide notice to candidates of therequirements, the appeals process, and the design (fairness) of the The SCDE must have adequate documentation in placethat tells candidates how and when to prepare the portfolio, how it will be reviewed, who is allowed to help them and how much help they can receive, theconsequences of failure and the opportunities forremediation, and what their due process rights andprocedures are if they wish to challenge the review results.
24 of 30 appeals process. 4There must be adequate instructionalopportunities provided to candidates to succeed in meeting the requirements ofthe portfolio/test and to remediate when performanceis inadequate. The SCDE should embed portfolio preparation,including the contents of the portfolio, into its instructional program (i.e., coursework). Anyrequirements outside of the instructional program c ould be subjected to a claim based on instructional/curr icular validity. The entire faculty need to buy into, andsupport, portfolio preparation activities of the st udents and provide remedial opportunities for components t hat are found lacking. 5There must be a realistic cut-score for determining ifthe performance is acceptable. This cut-score mustdifferentiate between those who are competent to enter theprofession and those who are not. This is the most difficult aspect of portfolio desi gn. The SCDE will need to identify the specific score orcharacteristics that sort teachers into the dichoto mous categories of competent and not competent based ontheir portfolios. 6Alternatives must be provided to candidates whocannot successfully complete requirements, or the SCDE must beable to demonstrate why no alternatives exist. If the portfolio is a container of evidence, thealternatives must relate to specific pieces of evid ence. The institution must ensure, however, that alternat ives do not detract from the representativeness, relevan cy and proportionality criteria. If the portfolio is u sed as evidence of a specific standard, such as reflection then an equivalent alternative should be identified if a t all possible. 7The results of the portfolio evaluation (scoring) and theextent to which protected populations are equally ordisproportionately successful must bemonitored. If the SCDE finds that a disproportionate number ofprotected populations (minorities, handicapped, women) do not successfully complete the portfolioassessment process, the SCDE must prepare to defendits use of the portfolio in terms of all of the abo ve requirements 1-6 and show why no alternatives exist or are offered to the protected classes.
25 of 30 8The process must be implemented and monitored toensure reliable scoring and to provide for adequate candidatesupport. Tests of reliability must be performed and samples of candidate work and faculty scoring must be reviewedon a regular basis to ensure that procedures andscoring are not drifting and to minimize measurem ent error. Raters need to be trained and updated on aregular basis. Directions need to be clear. Portfol ios across candidates need to be comparable in difficul ty. Rater mood and fatigue need to be carefully monitor ed. Safeguards against cheating need to be implemented.The sufficiency of items in the portfolio must beadequate. Records should also be kept of allexceptions made, alternatives provided, due processproceedings, and faculty/candidate training.Do Portfolios Have a Place in Teacher Training andCertification?Portfolios remain an excellent assessment device to support learning. Questions raised in this article relate to the use of portfolios for summative certification decisions for all or most standards c ombined, especially when contents vary widely. When contents are the same ac ross students, then questions can be raised about what purposes the por tfolios actually serve. Is a checklist enough to determine if all work is comple ted satisfactorily? If so, could some other type of tracking system be used that pro vides less burden on both faculty and students? If the reflective aspect is c onsidered essential, are there other forms of reflection that might serve equally well, such as a professional development plan? The professional development plan would be a job-related task in any state where districts require teachers to develop such plans. It is a widely accepted, and research supported, view that teachers who identify their own strengths and weaknesses as well as those of th eir students are better practitioners than those who do not do so and, typi cally, teachers participate in professional development planning and activities fo r improvement purposes in most states and school districts.There are also some instances in which portfolios c an be used to assess specific skills that have been accepted as critical to effective teaching. These instances can help to differentiate between the com petent and the incompetent teacher and are job-related. For example, a portfol io of K-12 student work, used to assess the extent to which a teacher candidate c an teach students to think critically and creatively would be an appropriate test. This is clearly a job-related task, since the teacher is required in most states and school districts to demonstrate that children are learning.These authors are suggesting a more limited and foc used use of portfolios portfolios to measure specific, job-related skills. By limiting the use and complexity of portfolios, the long known values of portfolio assessment can be realized without burdening faculty and students wit h excessive requirements that have limited use and without taking serious ps ychometric and legal risks.Conclusions and Implications
26 of 30 The shift of responsibility from state departments of education to teacher preparation programs has increased the likelihood t hat SCDEs will face legal challenges when candidates are denied diplomas and certification/licensure based on the tests used in the academic program. Pa rticularly vulnerable are the cumulative or showcase portfolios currently bei ng required in many SCDEs as evidence of candidate demonstration of standar ds and competency. When these portfolios are used as a measure of job perfo rmance themselves, or when they are evaluated using criteria that are related in only tangential ways to authentic job tasks, or when they are not substanti ally related to standards required for state program approval, or when they a re prepared as an extra-curricular activity, or when they contain stu dent-selected evidence, or when they are not adequately monitored for reliabil ity or bias, the threat of litigation increases as the SCDEs fail to pay atte ntion to psychometrics. New standards make psychometric qualities more impo rtant than ever to avoid challenges more than ever before. High-stakes testi ng has informed an army of students and lawyers to the details of tests, so it is easier to sue. To avoid litigation, SCDEs must carefully consider the desig n and implementation of portfolios and should consider a heavier reliance o n individual tasks that are combined in a way that leads to an appropriate deci sion or cut score that differentiates between candidates who are likely to be competent teachers and those who are not. The use of key course and intern ship-embedded tasks that measure critical skills, that are reviewed to ensur e that they are representative and relevant job-related measures of the domains, t hat are evaluated by the faculty who assign them, that are tracked through s tudent records (paper or electronic), and that are combined in meaningful wa ys to establish which candidates are likely to be competent teachers hold far better promise of satisfying the psychometrics and keeping the big an d little children safe in both their university and K-12 classrooms.ReferencesAmerican Educational Research Association, American Psychological Association, and National Council of Measurement in Education (1999). Standards for educational and psychological testing. Arter, J. and V. Spandel (Spring 1992). sing Portf olios of Student Work in Instructional Assessment. Educational Measurement: Issues and Practice 11, 1: 36-44. (cited in Herman, et al., 1992). Council of Chief State School Officers. (1998). Key state education policies in K-12 education: Standards, graduation, assessment, teacher licensur e, time, and attendance: A 50-state report Washington, D.C.: Author. Dollase Richard H. (1998) When the State Mandates Portfolios: The Vermont Experience. In Lyons, N. (Ed.) (1998). With Portfolio in Hand: Validating the New Teacher Professionalism. Teachers College Press, New York NY. Equal Employment Opportunity Commission, 1978. Uniform Guidelines on Employee Selection Procedures Washington, D.C. Gearhart, Maryl and Herman, Joan L. (1995). Portfol io assessment: Whose work is it? Issues in the use of classroom assignments for accountability. Herman, Joan L.; Aschbacher, Pamela R.; Winters, Ly nn (1992). A Practical Guide to Alternative
27 of 30 Assessment Association for Supervision and Curriculum Deve lopment, Alexandria, VA. Hazi, Helen M. (1989). Measurement versus superviso ry judgment: The case of Sweeney v. Turlington, Journal of Curriculum and Supervision Spring, 1989, 4(3), 211-229. Ingersoll, Gary M. and Scannell, Dale P. (2002). Performance-Based Teacher Certification: Creating a Comprehensive Unit Assessment System. Fulcram Publishing, Golden, CO. Kane, Michael (1994). Validating the performance st andards associated with passing scores, Review of Educational Research, Fall 1994, 64(3), 425-461. Koretz, Daniel (1994). The Evolution of a Portfolio Program: The Impact and Quality of the Vermont Portfolio Program in Its Second Year (1992-1993). R eport from the National Center for Research on Evaluation, Standards, and Student Testing, Los Angeles, CA. Office of Educational Research and Improvement, Washington, D.C. Lee, William W. and Owens, Diana L. (April 2001). C ourt Rulings Favor Performance Measures, Performance Improvement 40(4). Lemke, June C. (2002). Preparing the best teachers for our children. In: NO Child Left Behind: The Vital Role of Rural Schools. Annual National Confer ence Proceedings of the American Council on Rural Special Education (ACRES). 22nd, Reno, NV, March 7-9, 2001. McDonough, Matthew, Jr. and Wolf, W.C., Jr. (1987). Testing teachers: Legal and psychometric considerations. Educational Policy. McGinty, Dixie (1996). The demise of the Georgia Te acher Performance Assessment Instrument, Research in the Schools, 3(2), 41-47. Mehrens, William A. and Popham, W. James (1992). Ho w to evaluate the legal defensibility of high-stakes tests. Applied Measurement in Education 5(3), 265-283. Mellnick, Susan and Pullin, Diana (2000). Can you t ake dictation? Prescribing teacher quality through testing. Journal of Teacher Education 51(4), 262-275. National Council for Accreditation of Teacher Educa tion (2000). Porfessional standards for the Accreditation of schools, colleges, and departments of education. NCATE, Washington, D.C. Moss, Pamela (1998). Rethinking validity for the as sessment of teaching. In Lyons, N. (Ed.) (1998). With Portfolio in Hand: Validating the New Teacher Professionalism. Teachers College Press, New York NY. Nweke, Winifred and Nolan, Juanie (1996). Diversity in teacher assessment: Whats working, Whats not? Paper presented at the Annual Meeting of the A merican Association of Colleges for Teacher Education (48th, Chicago, IL, February 21-2 4, 1996. Pascoe, Donna and Halpin, Glennelle (2001). Legal i ssues to be considered when testing teachers for initial licensing. Paper presented at the Annua l Meeting of the Mid-South Educational Research Association (30th, Little Rock, AR., Novem ber 13-16, 2001). Pullin, Diana C. (2001). Key questions in implement ing teacher testing and licensing, Journal of Law and Education, 30(3), July 2001, 383-429. Rebell, Michael A. (1991). Teacher performance asse ssment: The changing state of the law, Journal of Personnel Evaluation in Education 5:227-235. Sandman, Warren. (1998). Current Cases on Academic Freedom. Paper presented at the annual Meeting of the National Communication Association. New York. Salzman, Stephanie A.; Denner, Peter R.; Harris, La rry B. (2002). Teacher Education Outcomes Measures: Special study survey. American Associatio n of Colleges of Teacher Education, Washington, D.C. Sireci, Stephen G. and Green, III, Preston, C. (200 0). Legal and psychometric criteria for Evaluating Teacher Certification Tests, Educational Measurement: Issues and Practice 19(1), 22-24.
28 of 30 Stiggins, Richard J. (2000). Specifications for a Performance-Based Assessment S ystem NCATE Web Site, on-line: http://www.ncate.org/resources/commissioned%20paper s/stiggi ns.pdf Wilkerson, Judy and Lang, William Steve (January 20 03). Analysis of Performance Assessment Survey of Teacher Preparation Institutions. Survey analysis prepared for the Florida Department of Education, Tallahassee, Fl. Wilkerson, Judy; Lang, William Steve; Egley, Robert ; Hewitt, Margaret (January 2003). Designing Standards-Based Tasks and Scoring Instruments to Co llect and Analyze Data for Decision-Making. Workshop presented at the annual meeting of the American Association of Colleges of Teacher Education in New Orleans, LA.. Wilkerson, Judy (2000). Program accountability for beginning teachers subject matter knowledge and competency and how to meet the challenge: A sta tes perspective on program accountability for teacher education graduates competency. Sympos ium paper presented at the Annual Meeting of the American Association of Colleges of Teacher Education, Chicago, IL. Zirkel, Perry A. (June 2000) Tests on trial, Phi Delta Kappan 81(10), 793-4.About the AuthorsJudy Wilkerson and William Steve Lang are on the faculty at the University of South Florida St. Petersburg. Both teach courses in assessment and research. His research interests include the Rasch model and performance assessment. Her interests are evaluation and accreditation stan dards. Email: email@example.com & wslang@temp est.coedu.usf.edu The World Wide Web address for the Education Policy Analysis Archives is epaa.asu.edu Editor: Gene V Glass, Arizona State UniversityProduction Assistant: Chris Murrell, Arizona State University General questions about appropriateness of topics o r particular articles may be addressed to the Editor, Gene V Glass, firstname.lastname@example.org or reach him at College of Education, Arizona State Un iversity, Tempe, AZ 85287-2411. The Commentary Editor is Casey D. Cobb: email@example.com .EPAA Editorial Board Michael W. Apple University of Wisconsin David C. Berliner Arizona State University Greg Camilli Rutgers University Linda Darling-Hammond Stanford University Sherman Dorn University of South Florida Mark E. Fetler California Commission on TeacherCredentialing Gustavo E. Fischman Arizona State Univeristy Richard Garlikov Birmingham, Alabama
29 of 30 Thomas F. Green Syracuse University Aimee Howley Ohio University Craig B. Howley Appalachia Educational Laboratory William Hunter University of Ontario Institute ofTechnology Patricia Fey Jarvis Seattle, Washington Daniel Kalls Ume University Benjamin Levin University of Manitoba Thomas Mauhs-Pugh Green Mountain College Les McLean University of Toronto Heinrich Mintrop University of California, Los Angeles Michele Moses Arizona State University Gary Orfield Harvard University Anthony G. Rud Jr. Purdue University Jay Paredes Scribner University of Missouri Michael Scriven University of Auckland Lorrie A. Shepard University of Colorado, Boulder Robert E. Stake University of IllinoisUC Kevin Welner University of Colorado, Boulder Terrence G. Wiley Arizona State University John Willinsky University of British ColumbiaEPAA Spanish and Portuguese Language Editorial BoardAssociate Editors for Spanish & Portuguese Gustavo E. Fischman Arizona State Universityfischman@asu.eduPablo Gentili Laboratrio de Polticas Pblicas Universidade do Estado do Rio de Janeiro firstname.lastname@example.orgFounding Associate Editor for Spanish Language (199 8-2003) Roberto Rodrguez Gmez Universidad Nacional Autnoma de Mxico Adrin Acosta (Mxico) Universidad de Guadalajaraadrianacosta@compuserve.com J. Flix Angulo Rasco (Spain) Universidad de Cdizfelix.email@example.com
30 of 30 Teresa Bracho (Mxico) Centro de Investigacin y DocenciaEconmica-CIDEbracho dis1.cide.mx Alejandro Canales (Mxico) Universidad Nacional Autnoma deMxicocanalesa@servidor.unam.mx Ursula Casanova (U.S.A.) Arizona State Universitycasanova@asu.edu Jos Contreras Domingo Universitat de Barcelona Jose.Contreras@doe.d5.ub.es Erwin Epstein (U.S.A.) Loyola University of ChicagoEepstein@luc.edu Josu Gonzlez (U.S.A.) Arizona State Universityjosue@asu.edu Rollin Kent (Mxico) Universidad Autnoma de Puebla firstname.lastname@example.org Mara Beatriz Luce (Brazil) Universidad Federal de Rio Grande do Sul-UFRGSlucemb@orion.ufrgs.br Javier Mendoza Rojas (Mxico)Universidad Nacional Autnoma deMxicojaviermr@servidor.unam.mx Marcela Mollis (Argentina)Universidad de Buenos Airesmmollis@filo.uba.ar Humberto Muoz Garca (Mxico) Universidad Nacional Autnoma deMxicohumberto@servidor.unam.mx Angel Ignacio Prez Gmez (Spain)Universidad de Mlagaaiperez@uma.es DanielSchugurensky (Argentina-Canad) OISE/UT, Canadadschugurensky@oise.utoronto.ca Simon Schwartzman (Brazil) American Institutes forResesarchBrazil (AIRBrasil) email@example.com Jurjo Torres Santom (Spain) Universidad de A Coruajurjo@udc.es Carlos Alberto Torres (U.S.A.) University of California, Los Angelestorres@gseisucla.edu EPAA is published by the Education Policy Studies Laboratory, Arizona State University
xml version 1.0 encoding UTF-8 standalone no
mods:mods xmlns:mods http:www.loc.govmodsv3 xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.loc.govmodsv3mods-3-1.xsd
mods:relatedItem type host
mods:identifier issn 1068-2341mods:part
mods:detail volume mods:number 11issue 45series Year mods:caption 20032003Month December12Day 33mods:originInfo mods:dateIssued iso8601 2003-12-03
xml version 1.0 encoding UTF-8 standalone no
record xmlns http:www.loc.govMARC21slim xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.loc.govstandardsmarcxmlschemaMARC21slim.xsd
leader nam a22 u 4500
controlfield tag 008 c20039999azu 000 0 eng d
datafield ind1 8 ind2 024
subfield code a E11-00343
Educational policy analysis archives.
n Vol. 11, no. 45 (December 03, 2003).
Tempe, Ariz. :
b Arizona State University ;
Tampa, Fla. :
University of South Florida.
c December 03, 2003
Portfolios, the pied piper of teacher certification assessments : legal and psychometric issues / Judy R. Wilkerson [and] William Steve Lang.
Arizona State University.
University of South Florida.
t Education Policy Analysis Archives (EPAA)