xml version 1.0 encoding UTF-8 standalone no
record xmlns http:www.loc.govMARC21slim xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.loc.govstandardsmarcxmlschemaMARC21slim.xsd
leader nam Ka
controlfield tag 001 002021443
007 cr mnu|||uuuuu
008 090729s2008 flu s 000 0 eng d
datafield ind1 8 ind2 024
subfield code a E14-SFE0002580
Factors affecting message intelligibility of cued speech transliterators
h [electronic resource] /
by Katherine Pelley.
[Tampa, Fla] :
b University of South Florida,
Title from PDF of title page.
Document formatted into pages; contains 94 pages.
Thesis (M.S.)--University of South Florida, 2008.
Includes bibliographical references.
Text (Electronic thesis) in PDF format.
ABSTRACT: While a majority of deaf students mainstreamed in public schools rely on interpreters, little research has investigated interpreter skills and no research to date has focused on interpreter intelligibility (Kluwin & Stewart, 2001). This thesis is the second in a series of experiments designed to quantify the contribution of various factors affecting the intelligibility of interpreters (transliterators) who use English-based communication modes. In the first experiment, 12 Cued Speech transliterators were asked to transliterate an audio lecture. Two aspects of these transliterated performances were then analyzed: 1) accuracy, as measured as the percent-correct cues produced, and 2) lag time, the average delay between lecture and transliterated message. For this thesis, eight expert receivers of Cued Speech were presented with visual stimuli from the transliterated messages and asked to transcribe the stimuli.Intelligibility was measured as the percentage of words correctly received. Results show a positive nonlinear relationship exits between transliterator accuracy and message intelligibility. Intelligibility improved with accuracy at the same rate for both novice and veteran transliterators, but receiver task difficulty was less for stimuli produced by veterans than novices (as evidenced by a left shift in the psychometric function for veterans compared to novices). No large effects of lag time were found in the accuracy-intelligibility relationship, but an "optimal lag time" range was noted from 1 to 1.5 seconds, for which intelligibility scores were higher overall. Intelligibility scores were generally higher than accuracy, but not all transliterators followed the same accuracy-intelligibility pattern due to other sources of variability.Possible sources of transliterator variability included rate of cueing, visible speech clarity, facial expression, timing (to show syllable stress or word emphasis), cueing mechanics, and mouth-cue synchronization. Further research is needed to determine the impact these factors have on intelligibility so that future transliterator training and certification can focus on all factors necessary to ensure highly intelligible Cued Speech transliterators.
Mode of access: World Wide Web.
System requirements: World Wide Web browser and PDF reader.
Advisor: Jean C. Krause, Ph.D.
x Communication Sciences and Disorders
t USF Electronic Theses and Dissertations.
Factors Affecting Message Intelligibility of Cued Speech Transliterators by Katherine Pelley A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science Department of Communicati on Sciences and Disorders College of Arts and Sciences University of South Florida Major Professor: Jean Krause, Ph.D. Catherine Rogers, Ph.D. Kelly Crain, Ph.D. Patricia Blake-Rahter, Ph.D. Date of Approval: July 18, 2008 Keywords: Interpreter Skills, Deaf E ducation, Visual Communication, Lecture Reception, Accuracy, Lag Time, Experience, Rate Copyright 2008, Katherine Pelley
Acknowledgments I would like to express my deepest grat itude to my committee chair, Professor Jean Krause for her persistent help and guidance on this thesis. Thank you to each of the people whose work contributed to this research project: to Morgan Tessler for the completion of accuracy measurement guidelines manual and for accuracy measurement and analyses, to Kendall Tope for contributions to stimulus preparation and for conducting hearing subjec t experiment and analysis, to Danielle Milanese for volunteering her time and translit eration skills for practice items, to Dana Husaim and Jessica Lindsay for their contributions to accuracy measurements and analyses, to Jane Smart for her contributi on to CST video clip editing and lag time measurements, and to Wendy Park for completion of lag time measurements. Additionally, I would like to thank my committee members, Professor Catherine Rogers, Professor Kelly Crain, and Professor Pa tricia Blake-Rahter for their contributions to this project.
i Table of Contents List of Tables iii List of Figures iv Abstract vi Chapter One: Introduction 1 Communication Options 2 American Sign Language 2 Manually Coded English 2 Cued Speech 3 Cued Speech Research 4 Interpreting vs. Transliterating 7 Assessment of Interpreters and Transliterators 8 Accuracy 11 Psychometric Functions 13 Experience 14 Lag Time 16 Rate of Presentation 17 Other Factors 19 Statement of the Problem 21 Chapter Two: Method 22 Participants 23 Materials 25 Preparation of Stimuli 29 Selection of Stimuli 30 Presentation Sessions 33 Receiver Ratings and Subjective Impressions 34 Scoring 35 Data Analysis 36 Chapter Three: Stimulus Selection 38 Accuracy 40 Experience 41 Lag Time 43 Individual CST Representation 44
ii Chapter Four: Intelligibility Results 48 Transliterator Intelligibility Differences 50 Accuracy-Intelligibility Functions 57 Individual Differences 62 Effect of Experience 62 Effect of Lag Time 64 Lag Time-Intelligibility Function 65 Chapter Five: Discussion 68 Other Sources of Variability 70 Transliterator Variability 71 Receiver Variability 75 Role of Training and Certification 77 Conclusions 80 Future Work 81 References 84 Appendices 87 Appendix A: Participant Information 88 Appendix B: Stimulus Selection 90 Appendix C: Receiver Impressions of Transliterator Performances 91
iii List of Tables Table 1 Transliterator Experience 26 Table 2 Stimulus Block Composition 39 Table 3 Number of Clips per Experience Category 43 Table 4 Accuracy for Selected Stimuli and Full CST Performances 46 Table 5 Stimulus Characteristics and Intelligibility by Receiver 49 Table 6 Stimulus Characteristic s and Intelligibility by CST 52 Table 7 Transliterator Rankings by Accuracy, Intelligibility, and Subjective Receiver Ratings 54 Table 8 Receiver Ratings of each CS T, Ranked by Intelligibility Scores 56 Table 9 Accuracy, Intelligibility, a nd Experience Profiles for Veteran CSTs, Ranked by Intelligibility 78 Table A1 Cued Speech Receivers 88 Table A2 Background and Scores (given 100% accurate stimuli) for Normal Hearing Listeners 89
iv List of Figures Figure 1. Format of survey used to collect each receiv erÂ’s ratings and subjective impressions of the 12 transliterators 34 Figure 2. Scatterplots of accuracy and key word accuracy for each stimulus block 41 Figure 3. Number of clips selected for each 0.5-second range of lag time values in each stimulus block 44 Figure 4. Accuracy distribution of sele cted stimuli for each transliterator across all stimulus blocks 45 Figure 5. The accuracy-intelligibility relationship, with mean and mode intelligibility scores shown for each 10-point accuracy interval 59 Figure 6. The accuracy-intelligibility relationship, with the proportion of data points that reach 70% or higher intelligibility shown for each 10-point accuracy interval 61 Figure 7. Accuracy-intelligibility likelihood functions plotted for each experience category 64 Figure 8. Accuracy-intelligibility li kelihood functions plotted for each 1-second lag time range 65 Figure 9. Lag time-intelligibility lik elihood functions, each showing the proportion of data points that reach 70% or higher intelligibility scores for each 0.5-second lag time range 67 Figure C1. Accuracy-intelligibility li kelihood functions plotted for each expert receiver, CS01 through CS04 91 Figure C2. Accuracy-intelligibility li kelihood functions plotted for each expert receiver, CS05 through CS08 92 Figure C3. Accuracy-intelligibility li kelihood functions plotted for each transliterator, CST1 through CST6 93
v Figure C4. Accuracy-intelligibility likelihood functions plotted for each transliterator, CST7 through CST12 94
vi Factors Affecting Message Intelligibility of Cued Speech Transliterators Katherine Pelley ABSTRACT While a majority of deaf students ma instreamed in public schools rely on interpreters, little research has investigated in terpreter skills and no research to date has focused on interpreter intelligib ility (Kluwin & Stewart, 2001). This thesis is the second in a series of experiments designed to qua ntify the contribution of various factors affecting the intelligibility of interpreters (transliterators) who use English-based communication modes. In the first experi ment, 12 Cued Speech transliterators were asked to transliterate an audio lecture. Two aspects of thes e transliterated performances were then analyzed: 1) accuracy, as measur ed as the percent-correct cues produced, and 2) lag time, the average delay between lecture and transliterated message. For this thesis, eight expert receivers of Cued Speech were presented with visual stimuli from the transliterated messages and aske d to transcribe the stimuli. Intelligibility was measured as the percentage of words correctly received. Results show a positive nonlinear relationship exits between transliterator accuracy and message intelligibility. Intelligibility improved with accuracy at the same rate for both novice and veteran transliterators, but receiver ta sk difficulty was less for stimuli produced by veterans than novices (as evidenced by a left shift in the psychometric function for veterans compared to novices). No large effect s of lag time were found in the accuracy-intelligibility
vii relationship, but an Â“optimal lag timeÂ” range was noted from 1 to 1.5 seconds, for which intelligibility scores were higher overall. Inte lligibility scores were generally higher than accuracy, but not all transliterators followed th e same accuracy-intelli gibility pattern due to other sources of variability. Possible source s of transliterator variability included rate of cueing, visible speech clarity, facial expression, timing (to show syllable stress or word emphasis), cueing mechanics, and mouth-cu e synchronization. Further research is needed to determine the impact these factor s have on intelligibility so that future transliterator training and certification can focu s on all factors necessary to ensure highly intelligible Cued Speech transliterators.
1 Chapter One Introduction In 1975, Public Law 94-142, the Educati on of All Handicapped Children Act, now referred to as the Individua ls with Disabilities Educati on Act (IDEA), was passed to prevent educational discrimination against ch ildren with disabilitie s (Marschark, Sapere, Convertino, & Seewagen, 2005). The passing of IDEA caused a dramatic increase in the number deaf and hard of hearing children being mainstreamed in public schools (Schick, Williams, & Bolster, 1999). While 80% of deaf students attended special schools in the 1950Â’s and 1960Â’s, today, 80% of deaf st udents attend mainstream public schools (Marschark, et al., 2005). Many of these d eaf and hard of hearing children rely on interpreters to gain access to the classroom a nd communicate with those around them. Yet limited quantitative research presently exis ts to verify the efficacy of educational interpreters for the deaf. E ducational interpreters are known to exhibit a wide range of skill levels (Schick, Williams, and Kupermintz, 2006), but Marschark and colleagues (2005) observe that Â“even in optimal learni ng conditions, we know very little about what students can learn through an interpreter,Â” (p.39). Similarl y, Kluwin and Stewart (2001) emphasize that there is Â“no empirical evid enceÂ” to indicate how well deaf students understand their interpreters (p. 15) In order to gain insight into these issues, this study is the beginning of a larger project that aims to quantitatively determine the access interpreters are provid ing deaf students. In particular this study focuse s on interpreters who use a communication option known as Cued Speech (CS).
2 Communication Options In order to evaluate the level of acce ss provided by interpreters, one important factor to consider is the m ode of communication used by the interpreter. Deaf students who rely on educational interpre ters in the United States cu rrently use various means of communication, including, but not limited t o, American Sign Language, Manually Coded English, and Cued Speech. American Sign Language (ASL). American Sign Language is a complete language, separate and independent from E nglish, with its own vocabulary and syntax (Scheetz, 2001). It is compos ed of signs (hand-movement configurations), which convey general meaning, produced with non-manua l signals (facial expressions and body postures) that function grammatically and/or refine the meaning of the signs in the sentence. Manually Coded English (MCE) The term Manually Coded English refers to sign systems which integrate components of bot h ASL and English, generally using or modifying signs from ASL in conjunction with structural elements of English (such as English word order and/or spoken Englis h mouth movements). A variety of MCE systems exist either by invention, with th e aim of bridging the gap between the two languages (and improving literacy in deaf ch ildren), or as a na tural result of the interaction between deaf signers and heari ng English speakers (Scheetz, 2001). Each MCE system varies with regard to the de gree of structure it has and how closely it follows English. Signing Exact English (SEE II; Gustason, 1990) is an example of a very
3 structured MCE system that incorporates Engl ish elements to a high degree. In SEE II, English word order is used, and initialized signs differentiate between English words which would be represented by a single sign in ASL. In this system, any English words that share at least two of the following char acteristics are represented by the same sign: a common pronunciation, a common spelling, and/or a common meaning (Gustason, 1990). Under this system, the words Â“batÂ” (as in ba seball) and Â“batÂ” (as in the animal) would share the same sign because they share a common spelling and pronunciation; however, the words Â“rainÂ” and Â“reignÂ” would be si gned differently because they share only a common pronunciation. SEE II also in corporates invented marker s, created specifically to show English morphology (word endings, tenses, and affixes) and additional invented signs (beyond those available in ASL), create d to show English words which could not be represented by a single sign. At the other end of the MCE spectrum is Conceptually Accurate Signed English (CASE), a less-structured system that is le ss tied to English word order and grammar. Because it resembles a Pidgin (natural mixing of two languages), CASE is sometimes referred to as Pidgin Signed English. In CASE, all signs and non-manual markers are borrowed from ASL for practical ity and efficiency (i.e. no in vented signs are used), but English word order and mouthing of English words is still used wherever possible to provide English structure (Winston, 1989). Although the main focus is to convey English, users have more flexibility in sign choice (Winston, 1989). Cued Speech Cued Speech is a system for conveying a traditionally spoken language, such as English, in a fully visual ly accessible form (Cornett, 1967). It
4 combines the mouth movements of speech (wit h or without voice) with a system of handshapes at specified placements around the m outh, designed to clarify the elements of speech that would be ambiguous through sp eechreading alone (Â“speechreadingÂ” is another term for Â“lip readingÂ” but implies watching not only the lips, but all the visible articulatory structures of speech). Handshapes represent consonant sounds and placements represent vowel sounds. Each ha ndshape-placement combination, or cue, is produced in synchrony with the mouth m ovements for the corresponding consonantvowel combination. Each of the speech s ounds (or phonemes) is thus made visually distinct through cues. Cued Speech Research Deaf individuals and/or their parents d ecide to use Cued Speech as a mode of communication for a variety of reasons. Because Cued Speech is a closed system, some may find it relatively easy to learn or find it appealing that once the system has been mastered, it can be used to express any utte rance that can be spoken, including unfamiliar vocabulary and foreign languages. In other in stances, parents become interested in CS when they learn that it has been shown to be an effective tool of communication: children who are experienced users of Cued Speech (i.e ., those with at least four years of Cued Speech experience) have near-perfect reception of cued materials, with averages for 18 profoundly deaf children ranging from 95% to 96.6% accuracy in the reception of both high and low probability key words (Nicholls & Ling, 1982). Moreover, such benefits in the reception of English via Cued Speech ove r speechreading alone begin to appear in
5 deaf and hard of hearing children within the first 1-2 years of using Cued Speech (Clark & Ling, 1976). Not only does CS improve the efficacy of communication over speechreading alone, but a growing body of res earch indicates that its use also can facilitate language and literacy development in deaf children, espe cially when started early in life (Leybaert & Charlier, 2003). For example, Leybaert and Ch arlier (1996) reporte d that deaf children who used Cued Speech for communication ha d phonological skills similar to those of hearing children for tasks that involved r hyming, remembering, and spelling words. In the rhyming task, the following six groups of children were tested on their ability to detect rhymes using pictures (rather than written words, to av oid the influence of spelling): 12 hearing children (mean age 8; 7 years), 16 children (mean age 10;1 years) who learned Cued Speech early, 18 children (mean age 12;7 years) who learned Cued Speech late, 12 children (mean age 13;3 year s) relying on the oral method, 12 children (mean age 10;4 years) who learned sign langua ge early, and 20 children (mean age 10;1) who learned sign language late. The children exposed to CS from an early age (three years old or younger) were able to detect pa irs of rhyming words, whether or not the words were orthographically similar, w ith roughly the same level of accuracy (orthographically similar: 97.4%; orthogra phically different: 94.4%) as the hearing children (orthographically similar: 95 .8%; orthographically different: 97.0%), outperforming all the other groups of deaf ch ildren. The pattern of performance between groups was similar for tasks related to reme mbering and spelling words. In addition, there is evidence that this pattern of performance may extend to reading skills as well.
6 While the average reading comprehension scor e for a deaf or hard -of-hearing eighteen year old is below the fourth grade read ing level (Traxler, 2000), Wandel (1989) found that students with profound hearing loss using Cued Speech as their primary communication mode scored similarly to h earing students on the r eading comprehension portion of the Stanford Achievement Test acro ss all ages tested (7 to 16 years old). Given these data, LaSasso and Metzger ( 1998) suggest that th ere are theoretical advantages to a communication philosophy that incorporates the use of both American Sign Language and Cued Speech, as the child enters school. They argue that the language of the childÂ’s home s hould be his/her first language (i.e. children with hearing parents should learn English as their firs t language through Cued Speech and children with deaf parents should lear n ASL as their first language) in order to take full advantage of using parents as a language model. LaSasso and Metzger emphasize that this philosophy would allow educators to take a dvantage of the benefits of both ASL and Cued Speech when teaching different subjects in school (for example, using Cued Speech when working on written English). Although this philosophy has proven attractive to some parents who choose to incorporate Cued Speech as one of their childÂ’s communication options, few schools to date ha ve adopted this new bilingual/bicultural philosophy using ASL and Cued Speech. Theref ore, for most parents, choosing Cued Speech as a communication option will mean mainstreaming their child with an interpreter who uses Cued Speech, know n as a Cued Speech transliterator.
7 Interpreting vs. Transliterating Depending on the communication option used by the deaf child, the terminology pertaining to an interpreti ng professional can vary. Interpreter is the term typically used to describe a professional who facilita tes communication between two different languages (for example, between English and the separate language of American Sign Language). When referring to interp reters who use Cued Speech, the term Â“transliteratorÂ” is typically used. A transliterator is someone who facilitates communication between different modes of th e same language (for example, between spoken English and written English or be tween spoken English and visual English through Cued Speech). The larger study (Krause, 2006), in which th is thesis is a part, will focus on the efficacy of transliterators because even though many deaf adults prefer to use ASL interpreters, transliterators are much more commonly used in educational settings. According to a survey by Jones, Clark, and Soltz (1997), more than 95% of educational interpreters use an English-based sign syst em, rather than American Sign Language. Of the 222 educational interpreters employed in Kansas, Missouri, and Nebraska who responded to the survey, 55.7% reported using Pidgin Signed English (PSE) or Conceptually Accurate Signed English (CASE) in their jobs. Another 32.7% reported using Signing Exact English (SEE II), while less than 5% of interpreters surveyed reported using pure ASL interpreting as educ ational interpreters. Although the exact number of Cued Speech transliterators is unknown, Cued Speech is an attractive candidate for initial study b ecause there is a definitive mapping from the spoken message
8 to the cued message (and vice versa). This one-to-one correspondence of spoken English phonemes and cued phonemes allows for extremely straightforward assessment of the cues produced by a transliterator (relative to the original spoken message) and how those cues affect the message received by deaf stude nts. Therefore, the pr esent study will focus specifically on assessing these relations hips for transliterators who use the communication option of Cued Speech. Assessment of Interpreters and Transliterators There is a great deal of research that n eeds to be done in the field of interpreting to ensure that deaf and hard of hearing st udents in the mainstream setting are receiving adequate access to classroom communication from the interpreters and transliterators on which they rely. This level of access depends primarily on intelligibility or the percentage of the interpreted/transliterated message correctly received by the deaf receiver. Although research di rectly assessing th e intelligibility of interpreters and transliterators is lacking, other methods of a ssessment have been used to evaluate the quality of interpreting services. Strong and Rudser (1985), for example, created one of the first objective rating scales to assess the a ccuracy of sign language interpreters. They assessed interpreter performances by analyzing each proposition (defined as a unit of text carrying a single semantic idea) interpreted, rather than attempting to analyze the interpreterÂ’s entire performance with a single rating. Alt hough some measure of subjectivity was necessary in order to include considerations of cultural adjustment (a trickier, more subjective decision-making compon ent of this assessment scale), this rating
9 method proved highly reliable, with Pearson r co rrelation coefficients for pairs of scores ranging from .9749 to .9985 (Strong & Rudser 1985). Moreover, this assessment method was also shown to be much more re liable than the shorter, more subjective measure for assessing interpreter skills that Strong & Rudser (1986) later developed, for which Pearson r correlation coeffici ents ranged from only .52 to .86. More recently, an assessment tool that has gained widespread acceptance is the Educational Interpreter Performance Assessmen t, or EIPA (Schick, Williams, & Bolster, 1999). The EIPA is an evaluation tool desi gned specifically to assess and certify educational interpreters in a classroom setting. It evaluate s the voice-to-sign and sign-tovoice skills of interpreters who use ASL, MCE, or Pidgi n Signed English (Schick & Williams, 2001). Skills relating to gramma r, prosody, sign vocabulary, fingerspelling, and other behaviors Â“critical to competent interpretingÂ” are rated on a Likert scale from zero to five (zero being Â“no skills,Â” fi ve being Â“advancedÂ”) (Schick, Williams, & Kupermintz, 2006). Rating is completed by an evaluation team of three people, at least one of whom is proficient in the specific sign system used by the interpreter (Schick & Williams, 2001). As a measurement tool, the EIPA has been shown to have good reliability, with correlations between rating teams ranging from 0.86 to 0.94 across the domains of evaluation. Coefficients of internal cons istency of skills within each domain of the assessment are also high (ranging from 0.93 to 0.98), while interdomain correlations used to assess validity suggest that each domain taps a different aspect of an interpreterÂ’s performance. As further evidence of valid ity, 42 interpreters with RID certification
10 averaged a score of 4.2 (SD = .06) on the EI PA (Schick et al., 2006). Therefore, individuals with RID certificati on can be expected to score in the Advanced range (4.0 or better) on the EIPA. Because of its high validity and reliabili ty, the EIPA has become an important research tool in assessing the quality of interp reters in educational settings. For example, results of EIPA testing have provided important research data which showed that out of the 2,091 sign language interpreters asse ssed on the EIPA from 2002 through 2004, only 38% were able to meet the minimum profic iency level of 3.5 required by most states (Schick, et al., 2006). Data reported by EIPA testing has also been instrumental in identifying skill areas that need improvement (e.g. sign-to-voice skills of interpreters working with younger children). Furthermore, future EIPA results could be used to monitor changes in the quality of educational interpreters over time. As interpreter training programs are modified to better addr ess these skill areas (and others yet to be identified), the EIPA can thus serve as an evidence-based mechanism for assessing the efficacy of interpreter training efforts. While such research provides highly va luable assessment information regarding aspects of interpreter performance, the quant itative relationship between those aspects (e.g. accuracy) and intelligibility remains unknown. Determining this quantitative relationship is important for ensuring that qua lity control standards for interpreters and transliterators are appropriate. As Schick et al. (2006) point out, even though a majority of educational interpreters are unable to meet the minimum standards of 3.5, it is still unclear as to whether or not this minimum le vel is high enough to ensure access to basic
11 classroom content (Schick et al., 2006). Even if the 3.5 standard is adequate, it would be difficult to analyze how the individual skills assessed in the EIPA affect intelligibility. Although each of the subskills (i.e. Â“appropriate eye contact/movement,Â” Â“d eveloped a sense of the whole messageÂ” and Â“stress/emphasis of important words or phrasesÂ”) scored is averaged equally to obtain an overall EIPA score, it is unknown whether or not each of the subskills represents aspects of equal importance for receive r intelligibility. Also, while th e Likert scale is well-chosen for the purposes of the EIPA, it does not pr ovide sufficient resolution for determining how any particular skill or subskill (e.g., Â“stress/emphasis of important words or phrasesÂ”) affects an interpreterÂ’s intelligibil ity. Because any particular rating on a scale of 0 to 5 will represent some variation in ab ility, the differences in ability between two interpreters who share the same score (for example, 3.5) on a given skill or subskill could still be great enough to affect intelligibility substantially. Therefore, the skill must be measured using a method that produces more re solution so that more levels of skills can be represented. With such a measure, the relationship between the skill score and the intelligibility of the message can then be analyzed empirically. Accuracy One skill that is very likely to affect intelligibility and can be measured with sufficient resolution for such an analysis is accuracy or the percentage of the message correctly produced by the interpreter/tran sliterator. Althoug h there is no known quantitative research regarding the accuracy of most types of interpreters, there is
12 information available regarding the accuracy of some Cued Speech transliterators (CSTs). Pelley, Husaim, Tessler, Lindsay, & Krause (2006) analyzed the cue sequence produced by each of six transliterators relativ e to a target cue sequence (i.e., the correct sequence based on a phonetic tran scription of the spoken message) in order to derive percent-correct scores representing each tr ansliteratorÂ’s accuracy for the two manual components of Cued Speech: handshape and placement. Of six CSTs employed in the educational setting, the averag e accuracy was 49% (among CSTs of different experience levels averaged over three different rates of presentation), with 33% of the target cues omitted, and 18% produced in error (i.e. substitutions, or incorr ect cues). Insertions of cues accounted for an extraneous 6% be yond the expected target cues. At the phrase/sentence level, a wide range of accuracy scores resu lted, ranging from near 0% to near 100%. Such accuracy measurements, however, do not measure directly how accessible each phrase/sentence would be to a deaf receive r. The relationship between transliterator accuracy and intelligibility is currently unknow n, and it must be empirically measured because the relationship is not likely to be perfectly linear. It cannot be assumed that transmitting 75% of the message faithfully renders it 75% intelligible. The only way to truly determine whether or not a particular accuracy level (50%, for example) is sufficient to provide an intelligible transliterated rendition of the original message is to determine the psychometric function that relates transliterator accuracy to intelligibility.
13 Psychometric Functions A psychometric function is a graphic plot of data point s which relates the physical characteristics of a stimulus with an associated psychologica l percept. The psychological percept (output), which is some aspect of part icipant performance, is plotted as a function of changes in the physical characteristics of a stimulus (input), such as sound pressure level (in dB). Psychometric functions are us ually sigmoidal shaped functions, with two properties that characterize th e relationship between the inpu t and output variables: slope and left-right shift. The slope is calcula ted on the linear portion of the function, which typically occurs between 20% and 80% of the maximum value of the output variable (Wilson & Strouse, 1999). The slope demonstrat es how rapidly the psychological percept (plotted along the y-axis) changes with incr eases in the physical characteristics of the stimulus (plotted along the x-axis). It is highly influenced by interand intra-subject variability, with higher variability yielding a flatter slope and lower variability yielding a steeper slope. The left-right shift of a psychometric function is determined by the difficulty of the task (Wilson & Strouse, 1999) with easier tasks yielding psychometric plots at lower x-axis values (farth er left) than ha rder tasks. Given these properties, psychometric functi ons are a useful tool to characterize and display the effect of various factors on intelligibility. Many such psychometric functions have been documented previously for various factors influencing speech reception (e.g. Wilson & Strouse, 1999), but no psychometric functions have been obtained for Cued Speech or any other visual communication mode. Psychometric functions for each of the visual communicati on modes are thus necessary to demonstrate
14 the role of the variables involved in their reception. As research is already available regarding the accuracy of Cued Speech messages produced by transliterators, the goal of this thesis is to determine a psychometric function for Cued Speech transliteration in order to characterize the relationship between the accuracy of the transliterated message (the independent variable plotted along the x-axis) and the message intelligibility (the dependent variable plot ted along the y-axis). In determining the accuracy-intelligibility psychometric function for Cued Speech transliterators, other factors that influence this relations hip can also be explored. A number of factors are alrea dy known to influence the shapes of psychometric functions (Wilson & Strouse, 1999) including stimulus material (i.e. words vs. sentences), presentation level, type of grading (i.e. key word versus all word), population, and response mode (i.e. writing English responses vs. identifying by pointing). For Cued Speech transliterators, it is expected that a dditional factors, such as CST experience, lag time, and rate of presentation, will also have similar effects on the shape of the accuracyintelligibility psychometric function. Experience. Many factors contribute to transliterator intelligibility and may influence the accuracy-intelligibility psyc hometric function. One such factor is transliterator experience. Vete ran CSTs are generally more accurate than novices, and are therefore likely to be more in telligible on average. In a study of six CSTs who were asked to transliterate materials at three differe nt presentation rates, three of four veteran transliterators averaged 60% correct cues co mpared with only 45% correct cues found for the two novices (Pelley et al., 2006). As a result, the overall relationship between
15 intelligibility and transliterator experience is expected to be similar to the relationship found by Pelley et al. (2006) between accuracy and experience. As experience increases, intelligibility is expected to be higher, with some exceptions due to differences in individual skill level. Of more relevance for this study, howev er, is that Pelley, et al. (2006) found differences in the error patterns between veterans and novices; veterans produced substitution errors most frequently (25%) and fewer omission errors (15%), while novicesÂ’ errors followed the opposite pattern (50% omissions and only 3% substitutions). Therefore, it is likely that even when accuracy is controlled novices and veterans may vary in intelligibility due to effects of error type. In analyzing accuracy differences between the experience groups, Pelley et al. obse rved that veterans cued a large majority of the message and cued faster than n ovices, but often Â“hypocued,Â” losing form and precision to cope with increasing speed. This technique produced more substitutions but allowed veterans to retain more cues. They omitted mostly cues within words and short sequences. Novices, on the other hand, tended to cue either slowly and highly accurately, with correct form, or to entir ely omit large chunks of the mess age. Thus, when accuracy is controlled, it is likely that veterans w ill generally have higher intelligibility than novices, in spite of losing form and precisi on, because more of the message will be at least partially transmitted, allowing the receiver to fill in gaps and correct errors more easily than if a large portion of the message is simply missing (as is the case with the novices). However, novices could be more intelligible than veterans if the veteransÂ’ errors are undecipherable to the deaf receiver, which is more likely to be the case as the
16 number of errors increases (i.e as accuracy levels decrease ). The accuracy percentage at which a veteran (with faster cueing abili ty) becomes so inaccurate that his/her intelligibility is poorer than a novice (who is unable to cue rapidly and therefore omits large portions of the messa ge) is unknown. This study s hould demonstrate whether omissions or substitutions are more detrimental to intelligibility, by comparing the leftright shift (accuracy level for which intel ligibility decreases dramatically) of the psychometric functions for transliterators of diffe rent experience levels. Lag time. Another factor that may influence the accuracy-intelligibility psychometric function is lag time. Lag time is the amount of time between the original source message and the interpreterÂ’s pr oduction of that message. Although the relationship between interpreter/transliterator intelligibility and lag time has yet to be established in research, there is some data regarding the relationship between accuracy and lag time. Cokely (1986) studied four AS L interpreters (with certifications from Registry of Interpreters for the Deaf) fr om a national conferen ce and found that the average onset lag times for the two interp reters with higher accu racy was 4 seconds (ranging from 1 to 6 seconds), while two poorer performing interpreters had shorter lag times, averaging 2 seconds (ranging from 1 to 5 seconds). Specifica lly, the interpreters with longer lag times produced a greater num ber of sentences, and a greater number of correct sentences, than the in terpreters with shorter lag times (Cokely, 1986). Cokely concluded that as lag time increases, the accuracy of ASL interpreters also increases, with an expected ceiling due to working memory (1986). In addition, he hypothesized that interpreter lag time is largely a function of the structural differences between the source
17 language and the target language, that the more different the structures of languages are, the longer the lag time is expected to be. T hus, CS transliterators could be expected to require shorter lag times than ASL interpreters, as a result of the relative similarity between spoken English and cued English (t hrough the phonemic leve l), in contrast to ASL being an entirely different language than spoken English. While preliminary evidence does suggest shor ter lag times for CS transliterators, it also suggests an inverse relationship be tween accuracy and lag time for these individuals. Pelley (2006) reported average lag times for one transliterator increased from 1.11 seconds for materials that were produced with 71% accu racy, to 1.23 seconds and 1.36 seconds for materials at 59% and 49% accuracy, respectively. However, the decreases in accuracy were also associated wi th increases in presentation rate, and it is not yet known which factor (lag time or presen tation rate) is primarily responsible for the observed changes in accuracy. Whether the same inverse relationship between accuracy and lag time holds when presentation rate is controlled is not yet known. As a result, the overall relationship between intelligibility a nd lag time for CS transliterators is difficult to predict. Of more interest for this study, however, is whet her differences in lag time are found to affect intellig ibility even when accuracy is controlled Rate of presentation. A transliteratorÂ’s rate of cuei ng is yet another factor that could affect intelligibility and the accuracy-i ntelligibility psychometric function. While no research has been conducted as of yet re garding the effect of cueing rate on the intelligibility of Cued Speech, multiple studies have evaluated the effect of rate on a personÂ’s ability to perceive linguistic stim uli in various other communication modes,
18 including American Sign Language, the Ro chester Method (or fingerspelling), and spoken English. Regardless of the comm unication mode, data suggest a production bottleneck for visually communicated sentence s. That is, sentences can be received correctly at faster rates th an they can be physically pr oduced. Fisher, Delhourne, and Reed (1999), for example, increased the rates of playback of vide otaped ASL signs and signed sentences and evaluated the percent co rrect scores of 14 native ASL viewers. Breakdowns in the ability to process the ASL stimuli did not occur until time compressions of 2.5 to 3 times the normal rate were made. This finding parallels the findings of Beasley, Bratt, and Rintelmann ( 1980), who researched the auditory reception of time-compressed speech (in sentences) a nd found near-perfect intelligibility for sentences at time compression factors up to 2.5, w ith intelligibili ty decreasing to 82% at a compression factor of 3.3 (the only compressi on factor tested above 2.5). Given that the average words per minute of sign language and speech are approximately equivalent (although speech is much more rapid, signing requires fewer units; Bellugi & Fisher, 1972), these findings suggest an upper limit to language processing that is independent of language and modality. Not every communication mode can be used effectively at these rates, however. While the normal speech rate is 4 to 5 sylla bles per second, fingerspelling is four times slower at 0.5 to 2 syllable s per second (Reed, Delhourne, Durlach, & Fisher, 1990). Even so, fingerspelling exhibits a similar production bottleneck. Reed et al. investigated the effects of time compression of the playback of videotaped fingerspelling and found that 6 deaf participants (ages 63 to 87) were able to receive substantia l amounts of linguistic
19 information at two to three times the normal rates of fingerspelling (again, rates in excess of what is physically possible to produce). While the average production rate of Cued Speech transliterators is unknow n, the physical limitations required for the production of Cued Speech at a rapid rate are likely simila r to that of fingerspelling, since both systems consist of individual characters produced by one hand. Cued Speech, however, has fewer individual characters required per word b ecause only one cue is required for each CV phoneme pair (i.e., roughly one cue per syllab le), as opposed to the one character per written grapheme required in fingerspelling. Therefore, the average rate of production of Cued Speech is expected to be faster than fingerspelling and similar to, but slower than, speech. Regardless of its average rate of producti on, it seems likely that Cued Speech also exhibits a production bottleneck. That is, cued materials are likely to retain intelligibility at up to 2 to 3 times the average rate of production Â– when materials are cued with 100% accuracy. When the cued message contains errors, however, it is unknown whether it can be processed over such a wide range of rate s. Increases in rate leave less time to complete the perceptual task of receiving and processing the interpreted/transliterated message: the less time available, the less opportunity for recogni zing and correcting errors in production. As a result, it is possi ble that as rate of presentation increases, the intelligibility of Cued Speech transliterator performances of a given accuracy may decrease. Other factors. In addition to experience and lag time, a variety of other factors may influence the accuracy-i ntelligibility psychometric function. Specifically, two
20 interpreters with the same accuracy can have differences in the manner in which they portray a message, making the message more or less clear to the deaf consumer (hence, affecting intelligibility, but not accuracy). These factors include, but are not limited to, speechreadability (how well the transliterator visibly articulates every word, even when omitting cues or hypo-cueing), prosody (how well the transliterator pauses, portrays stress patterns, etc.), and transliterator erro r patterns. Different error patterns that are likely to affect intelligibi lity, but not accuracy, include: 1) whole word omissions versus intra-word omissions : If part or most of the word is cued, there is more information available to the deaf receiver than if the entire word is missing. 2) type of word in error : If some transliterators, especially more experienced transliterators, put greater emphasis on cueing important content words (i.e. nouns and verbs such as Â“researchÂ”) and less emphasis on function words (i.e. Â“theÂ” and Â“toÂ”), the psychometric relationship be tween intelligibility and accuracy of key words only would exhibit a right sh ift of values along the x-axis, in comparison to the psychometric relationship between intelligibility and overall accuracy. 3) placement versus handshape errors : It is possible that some placement errors may be less detrimental to intelligibility than handshape errors. For example, when cueing rapidly, veteran transliterators some times fail to achieve correct placement but still move toward the placement. In th is case, the cue would be classified as a substitution error for placement, but so me placement information would still be available to the deaf receiver.
21 Statement of the problem This thesis focuses on how the intelligibility of Cued Speech transliterators, as measured by the percentage of transliterated words in m eaningful sentences correctly received by deaf consumers, varies with the accuracy of the message produced by the transliterator. It is hypothesized that inte lligibility will have a positive relationship with transliterator accuracy and that the psychomet ric function characterizing this relationship will be nonlinear. The shape of the accuracy -intelligibility psychometric function is expected to be affected by experience, w ith veterans producing psychometric functions that are less steep (more variable) and farthe r to the left compared to novices. As described earlier, psychometric functions shift to the left with easier tasks, and receiving transliterated messages from vete ran transliterators is expected to be an easier task overall (for a given accuracy level) than receiving messages from novice transliterators. The effect of lag time on the accuracy-intelligibili ty psychometric function is expected to be minimal because lag time is thought to be direc tly and inversely correlated with accuracy. Even when experience and lag time are controll ed, it is expected that variability due to other factors will affect the psychometric function. For example, it is hypothesized that in some cases two transliterators with the sa me accuracy will have different intelligibility scores, due to differences in cueing rate, speechreadability, or prosody.
22 Chapter Two Method The main focus of this study is to ch aracterize how the in telligibility of a transliterated message, as measured by the pe rcentage of words correctly perceived by expert Cued Speech receivers, varies with the accuracy level of the cued message produced by the transliterators. In addition, the effect that the professional experience of each transliterator has on this relationship is examined, and the relationship between lag time and intelligibility is also explored. Fi nally, in situations where two stimulus items were cued with the same accuracy percen tages, but did not result in the same intelligibility scores, the possible role of othe r factors is noted, including such factors as the transliteratorÂ’s mouth clarity, the rate at which the sentence was cued, the type of accuracy errors produced (e.g., omissions of entire words versus individual cues), the synchronization between cues and mouthshapes, etc. Two types of intelligibility are measured: original message (OM) intelligibility, or the percentage of the original spoken message that is correctly received, and transliterated message (TM) intelligibility, or the percentage correctly received of just that portion of the message that was actually cu ed by the transliterator. OM intelligibility is an overall measure of intelligibility and was used to analyze the accuracy-intelligibility relationship. It cap tures how much access deaf receivers actually receive from a transliterated lecture. However, the analysis of OM intelligibility alone has limitations. It cannot determine what portion of the unintel ligible information can be attributed to
23 omissions made by the transliterator, as opposed to receiver errors, nor can it differentiate between how much information the receivers might have filled in from context versus how much information they were given at leas t some access to by the transliterator. By analyzing the intelligibility of only those wo rds which the transliterator actually mouthed and cued correctly, or at least partially corre ctly, TM intelligibility is a better a measure of how much of the tran sliteratorÂ’s message was successfully received. Participants Eight expert Cued Speech receivers were re cruited to participate in intelligibility tests for this thesis. Although the results of this study will have implications for children who use Cued Speech transliterators in e ducational settings, children were not used because perception abilities and language sk ills, even for older children, may still be developing; therefore, only pa rticipants 18 years of age or older were included. In addition, all participants were required to be high school gr aduates; to pass a language screening; and to present with no known visual acuity problems, as participants were required to view video record ings on a computer monitor. The Expressive Written Vocabulary section of the Test of Adolescent and Adult Language Third Edition (TOAL-3; Hammil, Brown, Larsen, and Wiederholt, 1994) was used to screen for basic proficiency in wr itten English. The Written Vocabulary Section of the TOAL-3 requires examinees to correc tly construct a written sentence for each vocabulary word given. In isolation, this se ction of the TOAL-3 f unctions as a language screening tool, with normativ e data provided for normal-he aring children and adults.
24 Because the normative data does not include d eaf and hard of hearing individuals, this screening procedure ensured that all participants, deaf or hearing, possessed an English proficiency level on par with t ypical high school gradua tes. Participants were required to score within one standard devi ation of age-appropriate aver ages. All participants who were screened met this requirement (see Appendix A, Table A1). Expert Cued Speech receivers were recrui ted via advertisements sent to several regional and national Cued Speech organizations. To qualify as an Â“expertÂ” Cued Speech receiver, each participant was required to m eet the following criteria: 1) introduced to Cued Speech as a communication mode before age 10, 2) used Cued Speech as communication mode receptively (or receptively and expressively) at home (with at least one parent) and at school (through a teacher or transliterator), and 3) used the Cued Speech communication mode for at least 10 y ears. Additionally, participants were required to pass a visual-only receptive Cued Speech screening. Receptive Cued Speech skills were screened by presenting videos of c onversational sentences th at were cued with 100% accuracy for participants to view and tr anscribe. Participants viewed the cued sentences on the computer monitor and were asked to transcribe each sentence word-forword by writing their responses. Five convers ational sentences obtained from a list of Clarke sentences (Magner, 1972) were presen ted via Cued Speech (no audio), and only participants who correctly tran scribed 90% or more of the words in these five Clarke sentences were included in this study. Becau se these individuals participated in the portion of the experiment that does not in corporate audio information, there were no exclusionary criteria based on hearing level. However, each expert cue receiver was
25 required to complete a hearing and communi cation background survey, which asked them to classify their hearing leve ls as normal hearing or as mild, moderate, severe, or profound hearing loss and to provide some ba sic information about their communication background. All background info rmation collected on particip ants, including the results from this survey and from the CS screeni ng tests, are summarized in Table A1 of Appendix A. In a companion study, 12 normal heari ng individuals were also employed as control subjects in order to de termine the baseline intelligibility of the lecture material used in the experiment (Tope, 2008). Bac kground information on these individuals is summarized in Table A2 of Appendix A. Materials The video materials utilized in this study were taken from the videotaped performances of 12 Cued Speech transliterator s (CSTs) of varying experience levels who participated in the larger grant study (Kra use, 2006). The experience level of each transliterator has been classified for that study as Â“novice,Â” Â“experien ced,Â” or Â“veteran.Â” The background information and corresponding e xperience level classi fications for the 12 CSTs is summarized in Table 1. Classificat ions were based on responses to written questions regarding level of education, relevant certifications, amount of continuing education (in hours per year) a nd experience (in years) as a Cued Speech transliterator. The experience categories were defi ned (Krause, 2006) as follows: 1) novice : minimal certification or no certification, with work experience of less
26 than the equivalent of one full-time year 2) experienced : minimal certification with less than the equivalent of three fulltime years of work experience, or no cer tification with 3-5 years of experience 3) veteran : highest level of certification and/or more than 5 years of experience. Table 1 Transliterator Experience Work Experience as a CST Hours Per Week Transliterating Relevant CertificationsStudy CST1 1 year 2 hrs. 20 min. Certified Edu. CST by State Husaim & Tessler (2006) CST2 10 yrs. 35 hrs. Certified Edu. CST by State Husaim & Tessler (2006) CST3 4 yrs. 2 hrs. 15 min. Certified Edu. CST by State Husaim & Tessler (2006) CST4 22 yrs. 35 hrs. Certified Edu. CST by State Husaim & Tessler (2006) CST5 15 yrs. 35 hrs. Certified Edu. CST by State Lindsay (2006) CST6 15 yrs. 30 hrs. Certified Edu. CST by State Pelley (2006) CST7 9 yrs. 30 hrs. Natl. Ce rtified (TSC) Tessler (2007) CST8 5 yrs. 35 hrs. None Tessler (2007) CST9 15 yrs. 32.5 hrs. None Tessler (2007) CST10 18 yrs. 35 hrs. None Tessler (2007) CST11 6 yrs. 5 hrs. None Tessler (2007) CST12 20 yrs. 3 hrs. Natl. Ce rtified (TSC) Tessler (2007)
27 The CSTs transliterated a lecture abou t plants that was based on a 25-minute educational film entitled Life Cycle of Plants (Films for the Humanities, 1989). The original film contained video images with audio narration and was chosen because it was 1) part of the materials used by the Univers ity of South FloridaÂ’s Educational Interpreting Program, 2) designed to be used in a high sc hool setting, and 3) cons istent in vocabulary and pacing throughout the audio narration. Th e audio narration was broken into three sections of roughly equal length, and each se ction was re-recorded at a normally paced presentation rate (i.e. speak ing rate) with deliberate pa uses between phrase boundaries. The resulting recordings were slowed by an expansion factor of 1.25 to create a version of the materials at a Â“slow-conversati onalÂ” presentation rate and sped up by a compression factor of 0.8 to create a version of the materials at a Â“fast-conversationalÂ” presentation rate. Each tran sliterator was then presented with the three sections of the lecture at three different conve rsational presentati on rates (one section per presentation rate): slow-conversational, measured at 88 words-per-minute (wpm); normalconversational, measured at 109 wpm; and fa st-conversational, measured at 137 wpm. The transliterations elicited were filmed us ing a digital video camera and saved to a computer disk for analysis. Although each CST was exposed to every section and every presentation rate, only 4 CSTs transliterated any particular se ction at a given rate because the materials were counterbalanced across pr esentation rates. Of the video materials available at each of the three rates, only the transliterations el icited at the slowconversational rate were used in this study in order to 1) avoid any uncontrolled impact on intelligibility due to varyi ng rates of translite ration, 2) ensure the clips available
28 contained a wide range of accuracy performa nces (avoiding lower ceilings in accuracy performance which may occur at faster convers ational rates), and 3) model after the bestcase transliteration setting, where the speakerÂ’s rate is at a sl ower conversational rate with many pauses. In anticipation of intelligibility expe riments such as this study, the video performance of each transliterator was then edited into short video clips, using Adobe Premiere Pro 1.5. The video performances were segmented at phrase boundaries (one phrase per video clip), resulti ng in roughly 80 video clips per transliterator at the slowconversational rate. When possible, clip boundaries were aligned with visual Â“break pointsÂ” in the video that corres ponded to the deliberate pauses in the audio recording. In other words, visual Â“break poi ntsÂ” were points in the video where the transliterator had finished cueing the phrase or sentence and paused before cueing the next phrase or sentence, thus retaining the speakerÂ’s pause from the audio recordi ng. There were times, however, when the transliterator: 1) cued a liaison at a natural break point (thereby connecting two separate phrases together with cues), or 2) became unsynchronized, erroneously producing a cue from the boundary of one phrase or sentence with the mouth movements of another (making it impossible to divide the two sentences without the confusion of cue information belonging to a neighboring sentence). In these cases, two phrases or short sentences were either comb ined (if the resulting combination did not exceed 12 words) or divided at alternate break points, provided that the modified clips were semantically appropriate and did not c ontain more than 12 words. This upper limit on phrase length was instituted in order to maximize the likelihood that participants
29 would be able to remember the words in th e video clip long enough to be able to write them down. Given that most people can reme mber seven, plus or minus two, unrelated meaningful bits of information (Miller, 1956), it is reasonable to expect participants to remember 12 word phrases, considering that each phrase contains less than or equal to 7 content words and that the words in each clip are related and frequently organized into units (for example, the words within a prepos itional phrase, such as Â“at the bottom of,Â” would most likely not need to be remembered for the individual pieces, but as a single unit). Preparation of Stimuli. In order to analyze the rela tionship between accuracy and intelligibility, and the secondary effects of la g time and experience on this relationship, it was necessary to note the accuracy score and la g time associated with each clip, as well as the experience level of the transliterator who produced the clip. Accuracy data for each clip was derived from an existing database of cue-by-cue accuracy measurements created for previous studies in our laboratory (Pelley, et al., 2006; Tessler, 2007). In this database, every cue produced by each transliter ator was classified as a correct cue, a substitution (cue containing incorrect handshape and/or incorrect placement), an omission (deletion of a required cue), or an insertion (introducti on of an unnecessary cue). Accuracy data for each clip were then com puted in Microsoft Excel using these cue-bycue classifications, with formulas created to determine the accuracy score for the specific portion of the database corresponding to the cues in the clip. In addition to the accuracy data per clip, key word accuracy data for each clip was also derived from the existing accuracy data in order to allow for analys is of the relationship between key word
30 accuracy and key word intelligibility. Key words were identified by a panel of transliteration experts in part of the larger project as Â“content words that need to be in the script for full comprehensionÂ” (Kile, 2005). Key word accuracy data for each clip was calculated in Excel by computing the cueby-cue accuracy percen tages for each key word, then averaging the overall key word accuracy percentages for each of the key words found within the same clip. A similar database for lag time measurements was partially completed for other studies (Park, 2005; Smart, 2007). That database consisted of the measurements for eight of twelve transliterators, with lag times calculated for the beginning word, ending word, and middle syllable in each phrase or sent ence. Corresponding measurements were completed for the remaining four transliterat ors, and lag time data for each clip was computed by averaging the th ree (beginning, middle, and end) individual lag times within each video clip. If one or more of the three lag time s within a clip could not be calculated because the word was skipped by the translit erator, then the average lag time was based on the one or two available lag times for the clip. Selection of Stimuli. In all, roughly 900 clips elic ited at the slow-conversational rate were available for use in this study (3 sections x 4 translite rators per section x 75 phrases per section/transliterator). These clips included approximately four instances (one per CST) of each of the roughly 225 phrases in the audio narration. A subset of the video clips was selected and assembled into f our stimulus blocks such that all phrases from the audio narration could be presented to the receiver in order. Thus, each stimulus block consisted of approximately 225 clips. Th e video clips in each stimulus block were
31 not only selected in order to display all phrases from the narration, but also with the primary objective of selecting clips with a va riety of accuracy scores ranging from near 0% to near 100%. Within this range of accu racy scores, clips were selected to sample values across the accuracy spectrum, in orde r to provide a continuous accuracy variable along the x-axis for comparison with the contin uous dependent variab le, intelligibility. To achieve this range, a total of nine accura cy levels were targeted (10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, and 90%), with the goal of selecti ng 20 to 30 clips at each accuracy level for each stimulus block. Clips qualified for selection at a given accuracy level when the clipÂ’s accuracy score was within +/-5% of that accuracy target level. A secondary goal in stimulus selection wa s that the 20 to 30 clips selected at a given accuracy level would also be distribute d as equally as possible across location in lecture. In addition, clips at each accuracy level were composed of a variety of transliterators, when possible, to allow for analysis of transliterator experience level on the accuracy-intelligibility relationship. Simila rly, clips with a variety of lag times were also selected, so that the effect of diff erent lag times on the accuracy-intelligibility relationship could be analyzed. In order to manage these competing constraints during clip selection, a master spreadsheet was constructed to display all relevant information regarding each clip: the transliteratorÂ’s experi ence code, the lecture section (1, 2, or 3), and the target phrase or sentence, as well as the accuracy score and average lag time corresponding to the clip. Colo r coding was applied to values within each characteristic (accuracy, key word accuracy, experience, CST, and lag time) that were in short supply, with yellow, orange, and red indicating severity of shortage (calcu lated from histograms
32 of the values available with in each characteristic). The spreadsheet also denoted any phrases that were omitted entirely by a particul ar transliterator as well as any clips that were created from alternate break points. As clips were selected, a second spreadsh eet displayed running totals for each stimulus characteristic within various ra nges as well as the ideal number of clips corresponding to each of those ranges, for refe rence. Running totals were constantly reviewed as the experimenter attempted to m eet as many of the stimulus selection goals as possible. Clips were selected from the most constrained accuracy target ranges first and then went in order either from lowest accuracy range to highest (Stimulus Blocks 1, 3, 4, since more variety remained at the higher accuracy ranges) or th e reverse (Block 2). Within a target range, color coding was used to select clips with values that were severely needed whenever possible. In summary, individual tran sliterated video clips were selected 1) in order to display all phrases from the narration to the rece iver in order, 2) in order to display 20 to 30 clips at each accuracy level, 3) in order to distribute accuracy scores across location in the lecture to the maximum extent possible, 4) while using a variety of transliterators at different experience levels, 5) with clips that contain a vari ety of lag times. Using this stimulus selection procedure, four stimulus blocks were created so that each stimulus block consisted of a different set of vi deo clips. The stimulus blocks were counterbalanced across participan ts in order to minimize th e effect of context and the impact of coincidental pairings of di fficult phrases with poorer accuracies.
33 Presentation Sessions All participants were tested individually at a computer in a sound-treated room at the University of South Florida. Presen tation sessions were conducted in two 2-hour sessions, with one 15-minute and two 10-minut e breaks per session. Participants were also encouraged to take breaks as necessary to maximize attention. The original film Life Cycle of Plants was presented (withou t audio) in short segments (i.e. scenes) on a computer monitor. After each segment, the film was paused, and one or more stimulus items were presen ted. The stimulus item(s) corresponded to the audio narration for that segment of the film, so that the film segment that preceded them served to provide relevant context and simulate the cla ssroom environment. Each stimulus item consisted of one phrase (i.e. vi deo clip) of the transl iterated message. Given the goals of stimulus selecti on, consecutive stimulus items were not n ecessarily produced by the same transliterator. Nonetheless, the available materials dictated that stimulus items selected for each of the th ree lecture sections could be drawn from a subset of only four CSTs, which afforded cue receivers a chance to become familiar with the CSTs throughout the course of a lecture section. Cue receivers were instruct ed to type what the transliterator cued verbatim and were only permitted to view each segment of the educational film and corresponding stimulus items one time. All receivers controlled the rate of presen tation of the stimuli via a user interface implemented in Matlab, a computer software tool. The interface consisted of a window for displaying the video clips, a response blank for collecting their responses, and a Â“playÂ” button to click when they we re ready for the next item.
34 Receiver Ratings and Subjective Impressions Cued Speech receivers were also asked to rate transliterators and to provide their subjective impressions regarding each transliterat orÂ’s performance. After each section of the lecture was completed, receiv ers were given a short survey that included the pictures (for reference) of the four tr ansliterators they had just fi nished viewing. Figure 1 shows the format of the survey which was used to co llect this data was from each participant. At the conclusion of the experiment, each receiver was also asked to review all 12 transliterators and to circle a ny that were Â“highly effectiveÂ” and place an Â“XÂ” next to any that were Â“highly ineffective.Â” No limits (minimum or maximum) were placed on the number of transliterators that co uld be placed in either category. Most effective of these four transliterators Least effective of these four transliterators How would you feel about us ing this transliterator? Very comfortable OK Concerned I might miss something Anything about this CSTÂ’s cueing that you really liked or didnÂ’t like? Figure 1. Format of survey used to collect each receiverÂ’s ratings and subjective impressions of the 12 transliterators.
35 Scoring Intelligibility scoring included analysis of both OM intelligibility and TM intelligibility. For original message intelligibil ity, two types of scores were calculated to determine the intelligibility of 1) all words and 2) key words only (the same key words as those used for key word accuracy, described earli er). Percent-correct intelligibility scores were tabulated by examining the agreement be tween the typed responses and the original spoken messages corresponding to the translit erated phrases. When scoring all words, each word was required to be exactly corre ct, with the exception of obvious errors (explained below). When scoring key wo rds only, morphological errors involving the addition or deletion of affixes were considered acceptable, such as the omission of Â“-ingÂ” from Â“scurrying,Â” provided that the stem of the word was perceived correctly. In both types of scoring, credit was given for obvious sp elling and typographical errors as well as homophonous words, but lexical errors (such as Â“grow,Â” instead of Â“thriveÂ”) were considered incorrect in order to exclude situ ations where the receiver did not have access to a word because of poor accuracy, but fille d in the gaps given the context of the information (because intelligibility, not compre hension, was the measure of interest). For transliterator message intelligibilit y, only key word scores were measured. Participant responses were judged against only those key words from the original message that the transliterator actually attempted to cue (i.e. any key words the transliterator provided by m outhing and correctly cueing at least some portion of the key words). Participant responses were graded firs t by an autoscoring program in Matlab,
36 which automatically scored an entire sent ence as Â“correctÂ” when the participantÂ’s response contained all of the words to be sc ored (all words or key words) with exact spelling. The program also generated an Excel file containing any se ntence that was not entirely correct. This file wa s then hand graded by the experimenter, and credit was given for simple typographic errors and errors in spelling that did not change the word phonologically (for example, th e entry of a homophone such as Â“thereÂ” for Â“theirÂ”). For key word grading, the experimenter also gave credit for morphological errors that involved the addition or deletion of an affix. Data Analysis OM and TM percent-correct intelligibility scores from the experiment were calculated for each receiver a nd transliterator as well as for individual stimulus items. Transliterator scores were used to obtain an overview of the accuracy-intelligibility relationship (by comparing intelligibility scores with accuracy averages for each transliterator), and receiver scores were used to look for variability between receivers. The OM scores for individual stimulus items were compiled to construct three scatterplots and corresponding psychometric fu nctions for transliterato r intelligibility as a function of accuracy: 1) all word intelligibility vs. accuracy, 2) key word intelligibility vs. accuracy, and 3) key word intelligibility vs. key word accuracy. Key word intelligibility was plotted as a function of both accuracy and key word accuracy in order to focus on the effect of transliterator error patterns (key word errors vs. errors in other words) on the accuracy-intelligibility relationship. For each of these three sets of data, the strength of
37 the relationship between intelligibility and accuracy was measured using a SpearmanÂ’s rho correlation (a non-parametric test of correlation was used because the data were not normally distributed). In a ddition, the shape of the functi on was explored by determining what type of function appeared to fit the data best (linear, polynomial, probit, etc.) in a least-squares sense. The resulting functions were also analyzed with regard to factors such as transliterator experience and transliterator lag time. The three accuracy-intelligibility psychometric functions were plotted for each experience group (veteran, experienced, and novice) separately in orde r to examine the effect of experience on the shape of the functions. Then, the effect of lag time on the shape of the accuracy-intelligibility psychometric functions was examined by plottin g these functions for several different lag times (e.g. Â“shortÂ”, Â“mediumÂ”, and Â“longÂ”). In addition, the relationship between lag time and intelligibility was investigated, first by constructing a psychometric function with lag times for each stimulus item plotted along the x-axis and corresponding all word OM intelligibility scores plotted along the y-axis and then by measuring the strength of the relationship by conducting a SpearmanÂ’s rho co rrelation between th e two variables. Finally, for stimulus items with the same accuracy, but different intelligibility scores, further analysis, including analysis of subj ective receiver respons es, was informally conducted regarding the effect of cueing rate, transliterator error patterns (omission of whole-word versus the omission of multiple cues from several words), synchrony of the transliteratorÂ’s mouth and hand, clarity of the transliteratorÂ’s speech movements, stylistic differences in positioning, handshape, etc.
38 Chapter Three Stimulus Selection In selecting stimuli for each of the four stimulus blocks, the highest priority was to construct stimulus blocks containing a c ontinuous distribution of the full range of possible accuracy and key word accuracy va lues. Following the primary goal, other selection factors included selec ting clips containing a variety of experience levels and lag time values such that each participant woul d view approximately the same proportion of clips for each transliterator, experience level, and lag time interval. Using the stimulus selection procedures, four stimulus blocks were assembled. As shown in Table 2, each stimulus block consisted of approximately 225 video clips (225 clips for Stimulus Block 1, 224 clips for Stimulus Block 2, 227 clips for Stimulus Block 3, and 224 clips for Stimulus Block 4). Although each stimulus bl ock represented a transliteration of all phrases from the film narration, the number of total clips va ried across blocks depending on the number of times within a block th at two consecutive spoken phrases were combined into one video clip (for re asons given in the previous section).
39 Table 2 Stimulus Block Composition Number of clips Total Novel Clips Duplicates Triplicates Quadruplicates Block 1 225 225 0 0 0 Block 2 224 147 77 0 0 Block 3 227 123 48 56 0 Block 4 224 92 57 28 47 Because it was also necessary for each stimulus block to contain all phrases from the narration in order, there were, of course, limitations in the number of clips available for selection within each of these selecti on factors. Most notably, fewer clips were available at some accuracy levels (especia lly in the 0% to 35% accuracy range) than others. Similarly, there were limitations to the number of clips available for each transliterator because many transliterators di d not cue every sentence from the lecture. Experience level limitations were due to the unequal number of novice, experienced, and veteran transliterators who participated in this study (2 novice 1 experienced, and 9 veteran). Available lag time values were th e result of measurements of transliterator behaviors and could not be mani pulated. As a result of thes e limitations, it was necessary to use some clips in more than one stimulus block. Still, even by the fourth stimulus block that was assembled, 92 of the 224 clips selected were unique clips that had not been used in any previous stimulus block. The overlap between bloc ks is summarized in Table 2.
40 Accuracy The scatterplots in Figure 2 show the distribution of accuracy and key word accuracy values for each video clip selected within each block. The number of unique clips (shown as green diamonds) visibly decreas es with the addition of each new stimulus block. Nonetheless, these figures demonstrate that a continuous range in accuracy values from 0% to 100% is well-represented in each of the stimulus blocks. In order to obtain this range, however, nearly all clips availabl e with accuracy values from 0% to 35% were used in all four stimulus blocks due to the low proportion of clips containing cued phrases with accuracy tota ls in this range.
41 Figure 2. Scatterplots of accuracy and key word accuracy for each stimulus block. Novel video clips are shown as green diam onds, duplicate clips are shown with blue triangles, triplicate clips are shown as yellow squares, and quadruplicate clips are shown as red circles (representing clips that were used in all four stimulus blocks). Experience As seen in Table 3, the number of clip s representing a particular experience category was similar for each of the four stim ulus blocks, although the number of clips in Stimulus Block I0% 20% 40% 60% 80% 100% 120% 0%20%40%60%80%100%120 % Key Word Accuracy Stimulus Block II0% 20% 40% 60% 80% 100% 120% 0%20%40%60%80%100%120 % Stimulus Block III0% 20% 40% 60% 80% 100% 120% 0%20%40%60%80%100%120 % AccuracyKey Word Accuracy Stimulus Block IV0% 20% 40% 60% 80% 100% 120% 0%20%40%60%80%100%120 % Accuracy Novel Clips Duplicates Triplicates Quadruplicates
42 each of the three categories was quite di fferent. Given that there was only one transliterator classified as Â“experiencedÂ” and two translitera tors classified as Â“novice,Â” while nine transliterators were classified as Â“veteran,Â” it was not possible to select the same number of clips in each experience category. Rather, each experience category was represented proportionally by dist ributing clips as equally as possible among CSTs. With approximately 225 clips per stimulus block and 12 CSTs, the ideal distribution across CSTs would call for approximately 19 clips per CST. Thus, the ideal experience distribution for each stimulus bl ock would be 38 clips in the novice category (19 clips x 2 CSTs), 19 clips in the expe rienced category (19 clips x 1 CST), and 171 clips in the veteran category (19 clips x 9 CSTs). As Ta ble 3 shows, every stimulus block contained sufficient representation of both the veteran and the experienced categories as well as a high percentage of the ideal number of clip s for the novice category, with a minimum of 32 novice clips per stimulus block. In addition, Appendix B lists the number of stimulus clips per CST for each stimulus block, which confirms that the clip s selected within e xperience category were well-distributed across the CSTs in that category. While it was not possible to select exactly 19 clips per CST and balance all f actors involved (accuracy, key word accuracy, CST, experience, and lag time), it is important to note that the number of clips per CST for all four stimulus blocks remained within the range of 13 to 26 clips, with only about 5% of the CST clip totals deviating by more th an 4 clips from the ideal total of 19 clips.
43 Table 3 Number of Clips per Experience Category Ideal Block 1 Block 2 Block 3 Block 4 Novice 38 41 35 32 33 Experienced 19 21 25 21 26 Veteran 171 169 171 178 172 Lag time Overall, the clips available for stimulus selection contained a wide range of lag time values. These values were not uniformly distributed, as the available lag time values resulted from transliterator behavior and coul d not be manipulated for the purposes of the study. Clips were therefore chosen to repr esent the underlying distribution, with each range of lag time values represented proporti onally; because a majority of the clips available for selection contained lower lag time values (from approximately 0.75 seconds to 1.75 seconds), a majority of the selected clips contained th ese lag time values as well. As shown in Figure 3, no one stimulus bloc k contained a dispropor tionate number of clips at any specific range of lag time valu es. As a result, the number of clips representing a particular range of lag tim e values was similar for each of the four stimulus blocks, although the number of clips in each range varied widely.
44 0 10 20 30 40 50 60 70 80 0.511.522.533.544.555+Lag Time (in sec.)Number of clips Block 1 Block 2 Block 3 Block 4 Figure 3. The number of clips selected for each 0. 5-second range of lag time values in each stimulus block. Individual CST Representation Finally, the accuracy values correspo nding to the stimuli selected for each transliterator were examined in order to determine whether a full range of accuracy values was obtained per CST. Figure 4 show s that while every CST was represented by a wide range of accuracy values, only about ha lf were represented throughout the full range of possible accuracy values. This can most likely be explained by a lack of available clips produced within certain accuracy ranges by individual CSTs. However, another possibility is that the clips selected may not have sampled the full range of an individual CSTÂ’s accuracy values. If this were th e case, a discrepancy between the average
45 accuracy of the stimuli selected for a part icular CST and that CSTÂ’s overall accuracy would be expected. However, Table 4 shows that the average accuracies of the stimuli selected for each CST were, in general, high ly reflective of their overall accuracies. All Blocks0 1 2 3 4 5 6 7 8 9 10 11 120%20%40%60%80%100%AccuracyTransliterator Figure 4. Accuracy distribution of selected st imuli for each transl iterator (CST1 through CST12) across all stimulus blocks.
46 Table 4 Accuracy for Selected Stimuli and Full CST Performances Transliterator Accuracy Selected Stimuli (%) Full CST Performance (%) CST1 68 55a CST2 67 66a CST3 81 64a CST4 40 40a CST5 69 68b CST6 71 71c CST7 86 84d CST8 51 53d CST9 59 55d CST10 47 49d CST11 73 73d CST12 90 89d a Husaim & Tessler (2006) b Lindsay (2006) c Pelley (2006) d Tessler (2007) The only two transliterators whose accu racy was not well-represented by the stimulus selection were the two novice transliterato rs (CST1 and CST3). In these two cases, the average accuracy of the stimuli sele cted was much higher than the true overall accuracy of the transliterator: the average accu racy of the stimuli selected for CST1 was 68%, 13 percentage points higher than his/he r actual overall accuracy (55%), and the average accuracy of the stimuli selected fo r CST3 was 81%, 17 percentage points higher
47 than his/her true overall accu racy score (64%). This disparity was due to the high percentage of whole-word omissions produced by these two novice transliterators, who often omitted entire phrases. When a phrase was omitted, there was no clip available for selection. If these Â“non-clipsÂ” (skipped phrases) had been eligible for stimulus selection, the average accuracy of the stim uli selected for each of th ese two transliterators would have been much lower.
48 Chapter Four Intelligibility Results Table 5 summarizes the physical charact eristics (accuracy, key word accuracy, and lag time) of the stimuli presented in the experiment, as well as the corresponding intelligibility results for each expert cue re ceiver. The information regarding physical characteristics confirms that the stimuli selected were indeed well-balanced across receivers with respect to accura cy scores, key word accuracy scores, and lag times. Each participant viewed a stimulus block that av eraged 61% accuracy across all clips presented (with varying accuracy scores 0% to 100%). The average key word accuracy of the stimulus clips was also approximately the same for each participant, with overall key word averages of 75% and 76% for each partic ipantÂ’s stimulus block. Lastly, the average lag time of the stimulus clips presented to pa rticipants varied only slightly, with average lag times per full stimulus block ranging fr om 1.8 seconds to 1.98 seconds, a difference of less than 10%.
49 Table 5 Stimulus Characteristics and Intelligibility by Receiver (Averaged Over Stimulus Block) Receiver Characteristics of Stimuli Received Intelligibility Accuracy (%) Key Word Accuracy (%) Lag Time (sec) OM All Word (%) OM Key Word (%) TM Key Word (%) CS01 61 75 1.98 79 83 89 CS02 61 76 1.80 72 78 83 CS03 61 75 1.98 76 84 89 CS04 61 76 1.80 72 79 84 CS05 61 75 1.84 69 75 80 CS06 61 75 1.82 74 76 81 CS07 61 75 1.84 68 70 76 CS08 61 75 1.82 65 70 74 Average 61 75 1.86 72 77 82 Range 8-100 0-100 0.34-7.310-100 0-100 0-100 The overall intelligibility (averaged across all transliterators and all receivers) for these stimuli was 72% for all words in the original message, 77% for key words in the original message, and 82% for key words in the tr ansliterated message. Even though large differences in absolute performance levels were observed between cue receivers (15 percentage points between th e highest and lowest intellig ibility scores obtained by individual expert cue receivers in each of the three intelligibility measures), the relative intelligibility of these three measures was al so observed in all individual cue receiver
50 averages. That is, all cue receivers obtained the highest overall sc ores for TM key word intelligibility and the lowest overall scores fo r OM all word intelligibility, regardless of their absolute performance levels. Moreover, the average OM intelligibility score obtained by each cue receiver was considerably higher than the average accuracy (61%) in all cases and higher than the average key word accuracy (76%) in roughly half of the cases. The average TM intelligibility scor es, ranging from 74% to 89%, were even higher than OM intelligibility scores, de monstrating that Cued Speech reception was generally high for any words that the transliterator attempted to cue1. Even so, TM intelligibility scores did not approach the inte lligibility scores of normal hearing listeners, who received, on average, 98% of all words in the original message and 99% of key words (Tope, 2008), given audio stimulus items based on the filmÂ’s narration (i.e. spoken with 100% accuracy). Transliterator Intellig ibility Differences As Table 6 shows, the intelligibility scor es for individual transliterators followed the same pattern as the overall intelligibility results: TM intelligibility scores were the highest of the three intelligibility measures followed by OM key word intelligibility, and finally, OM all word intelligibility. In a ddition, intelligibility scores for individual transliterators were substantially higher th an accuracy for most of the group: CST4 scored 14 percentage points higher on intelligibility than accuracy (only 40% on accuracy but 54% on all word intelligibility), CST6 scored 15 percentage points higher on 1 TM intelligibility here is averaged across CSTs and would be higher for some individual CSTs, presumably those with the highest accuracy.
51 intelligibility than accuracy (71% accuracy and 86% on all word intelligibility), CST9 and CST11 scored 17 percentage points higher on intelligibility than accuracy (59% and 73% on accuracy, but 76% and 90% on all word intelligibility, respectively), and CST8 scored 23 percentage points higher on intelligibility than accuracy (51% all word accuracy, but 74% all word intelligibility). Interestingly, the CST with the highest TM intelligibility (CST1) was not one of the transliterators with the highest accuracy or highest OM intelligibility. This large diffe rence between OM and TM intelligibility scores for CST1 can be attributed to th e high amount of paraphrasing used by this transliterator (a novice).
52 Table 6 Stimulus Characteristics and Intelligibility by CST (Averaged Over Stimulus Block) CST Characteristics of Selected Stimuli Intelligibility Accuracy (%) Key Word Accuracy (%) Lag Time (sec) OM All Word (%) OM Key Word (%) TM Key Word (%) CST1 68 70 3.30 66 66 95 CST2 67 81 1.66 75 79 79 CST3 81 85 3.42 76 82 86 CST4 40 59 3.41 54 66 79 CST5 69 85 1.92 77 86 88 CST6 71 84 1.10 86 88 88 CST7 86 95 1.13 87 91 91 CST8 51 75 1.21 74 78 78 CST9 59 75 1.10 76 79 79 CST10 47 67 0.76 52 60 60 CST11 73 85 1.41 90 94 93 CST12 90 96 1.10 87 91 91 Although each expert cue r eceiver saw a full stimulus block with the same accuracy values on average (i.e. 61%), Table 6 shows that the average accuracy for the stimuli selected from each CST was not the sa me. This was because the accuracy scores available from each CST were different. Gi ven these differences, the transliteratorsÂ’ average intelligibility scores provide a rough indication of the relationship between
53 accuracy and intelligibility. In general, the tr ansliteratorsÂ’ intelligibility scores followed accuracy, with translite rators who had the hi gher accuracy averages also obtaining higher intelligibility scores and transliterators with lower accuracy averages obtaining lower intelligibility scores. However, this pattern did not hold for all transliterators. Deviations from the pattern may be easie st to see by comparing the rank order of transliterators based on accuracy averages with the rank order of tran sliterators based on average intelligibility. These rankings are shown in Table 7. While group ranking was generally preserved between accura cy averages and intelligibility scores for a majority of the transliterators, both of the novice translit erators had lower intelligibility rankings than accuracy rankings. While most transliterators exhibited higher intelligibility than accuracy, the accuracy averages of the these transliterators were nearly the same as their intelligibility scores, with CST1 averaging 68% in accuracy, but only 66% in all word intelligibility, and CST3 averaging 81% in accuracy and only 82% in all word intelligibility. As a result, CST1Â’s intelligib ility ranked lower than three transliterators with lower accuracy rankings, and CST3 ranked lower in intelligibility than four transliterators with lower accuracy averag es. In contrast, CST9 and CST11 had intelligibility ranks that were considerably higher than expected given their accuracy ranks. With CST9 averaging only 59% in accu racy, but 76% in all word intelligibility and CST11 averaging 73% in accuracy, but a high 90% intelligibility, each was ranked higher in intelligibility than three trans literators with higher accuracy ranks.
54 Table 7 Transliterator Rankings by Accuracy, Inte lligibility, and Subjective Receiver Ratings Ranking Accuracy OM All Word Intelligibility Subjective Receiver Ratings 1st CST12 (90%) CST11 (90%)CST7 (9.375) 2nd CST7 (86%) CST12 (87%)CST11 (8.75) 3rd CST3 (81%) CST7 (87%) CST12 (6.875) 4th CST11 (73%)CST6 (86%) CST3 (5) 5th CST6 (71%) CST5 (77%) CST6 (4.375) 6th CST5 (69%) CST9 (76%) CST5 (3.75) 7th CST1 (68%) CST3 (76%) CST1 (2.5) 8th CST2 (67%) CST2 (75%) CST2 (2.5) 9th CST9 (59%) CST8 (74%) CST9 (1.875) 10th CST8 (51%) CST1 (66%) CST4 (1.25) 11th CST10 (47%)CST4 (54%) CST8 (0) 12th CST4 (40%) CST10 (52%)CST10 (0) Table 7 also shows subjective rankings for each transliterator, derived from receiver ratings. To obtain th e rankings, point values were a ssigned to each participantÂ’s responses (1.25 points for each Â“Very Comfor tableÂ” rating, 0.625 points for each Â“OkayÂ” rating, and 0 points for each Â“ConcernedÂ” ra ting), yielding a com posite rating from 0 (when all eight receivers rate d the transliterator with Â“C oncernedÂ”) to10 (when all eight
55 receivers rated the translitera tor with Â“Very ComfortableÂ”). For the eight CSTs who obtained similar rankings in both accuracy a nd intelligibility, subj ective rankings based on receiver ratings were also similar to these two rankings. Of the remaining four transliterators, three had subjective rankings that were similar to accuracy rankings (including CST1 and CST3 who ranked lower on intelligibility than accuracy and CST9 who ranked higher on intelligibility than accuracy). However, the subjective ranking of CST11 (who ranked first in intelligibility despite ranking fourth in accuracy) was most similar to his/her in telligibility ranking. More details regarding each expert cue receiverÂ’s ratings and subjective impressions of each of the twelve transliterat ors is summarized in Table 8. The data show a high amount of receiver agreement for ratings of the transliterators with the highest and lowest overall intelligibility. For example, th e transliterators with the best intelligibility scores, CST11, CST7, and CST12, received the highest subjective rating scores and were chosen by a majority (if not all) of recei vers as Â“highly effectiveÂ” transliterators. Similarly, the transliterator with the lowest intelligibility score, CST10, also received the worst intelligibility rating score and was una nimously chosen by all eight receivers as a Â“highly ineffectiveÂ” transliter ator. The greatest amount of variability in participant ratings was found for CST3 and CST6, where so me participants found the transliterators to be conveying a message with which they were Â“Very Comfortabl eÂ” with, some found the message to be Â“Okay,Â” and some part icipants were Â“Concerned [They] Missed SomethingÂ” from the CS TÂ’s transliteration.
56 Table 8 Receiver Ratings of each CST, R anked by Intelligibility Scores Ratings (Number of Receivers) Number of Times Selected Transliterator (Intelligibility) Composite Rating Very ComfortableOK ConcernedHighly Highly (0-10 pts) (1.25 pts) (0.625 pts)(0 pts) Effective Ineffective CST11 (90%) 8.75 6 2 0 8 0 CST7 (87%) 9.375 7 1 0 8 0 CST12 (87%) 6.875 4 3 1 6 1 CST6 (86%) 4.375 2 3 3 2 0 CST5 (77%) 3.75 0 6 2 1 2 CST3 (76%) 5 2 4 2 2 0 CST9 (76%) 1.875 0 3 5 0 3 CST2 (75%) 2.5 1 2 5 1 4 CST8 (74%) 0 0 0 8 0 8 CST1 (66%) 2.5 0 4 4 0 3 CST4 (54%) 1.25 0 2 6 0 4 CST10 (52%) 0 0 0 8 0 8 Finally, a few other idiosyncrasies in receiv er ratings should be mentioned. First, while CST12 was selected by 6 out of 8 receive rs as Â“highly effective,Â” 1 receiver found this transliterator to be Â“highly in effective.Â” Also, CST8 received an overall receiver
57 rating score of 0 (indicating 8 out of 8 rece ivers felt they missed something from the message due to the CSTÂ’s performance) a nd was unanimously selected as a Â“highly ineffectiveÂ” transliterator, despite achieving 74% in telligibility. This is in contrast to two transliterators with lower intelligibility scores (CST1 and CST4 scored 66% and 54%) who were rated more highly (at least some of the receivers felt Â“OKÂ” with the message conveyed by CST1 and CST4) and less freque ntly selected as Â“highly ineffectiveÂ” transliterators. Accuracy-Intelligibility Functions As described earlier, the primary objective of this paper is to characterize the relationship between Cued Speech transliterat or accuracy and the in telligibility of the messages received by expert cue receivers. To ward this end, scatterplots relating the accuracy and intelligibility of individual stim ulus items for the three combinations of measures were examined (see Figure 5): 1) accuracy vs. OM all word intelligibility, 2) accuracy vs. OM key word intelligibility, and 3) key word accuracy vs. OM key word intelligibility. Although the data points in each of these three scatterplots are widely distributed, they do suggest a positive relatio nship between accuracy and intelligibility. SpearmanÂ’s rank order correlations were pe rformed, confirming that a statistically significant positive correlation exists between accuracy and intelligibility for each of the three functions. This non-parametric test of correlation was employed because the data are not normally distributed. As is visible on the graphs, the distri bution of the accuracyintelligibility functions is skewed toward 100%, given that a substantial number of
58 stimuli reached the maximum intelligibility values. The accuracy-intelligibility correlation wa s strongest for the relationship between accuracy and all word intellig ibility of the original message and least strong for the relationship between accuracy a nd key word intelligibility of the original message, with SpearmanÂ’s rho values of 0.478 for all word ac curacy vs. OM all word intelligibility (p=0.000), 0.384 for all word accuracy vs. OM key word intelligibility (p=0.000), and 0.472 for key word accuracy vs. OM key word in telligibility (p=0.000). The variation in accuracy accounted for 26% of the variation in OM all word intelligibility scores and 22% of the variation in OM key word inte lligibility scores, while the variation in key word accuracy accounted for 25% of the variation in OM key word intelligibility scores. In order to help analyze the concentration of data points within each intelligibility scatterplot, Figure 5 also shows the mean and mode intelligibility values for each 20 percentage point accuracy interval from 0 to 100%. The mean reflects the general underlying linear relationship between accuracy and intelligibility. The mode, however, shows the most frequently occurring intelligibility values for a given accuracy range were sometimes far from the mean. This discre pancy suggests that other trends beyond the linear accuracy-intelligibility re lationship may exist but are difficult to see due to the copious number of overlapping data points on each graph.
59 0 20 40 60 80 100 0 20 40 60 80 100 Accuracy (%)All word OM (%) All Data y = 0.589x + 36.3 R2 = 0.264 0 20 40 60 80 100 0 20 40 60 80 100 Accuracy (%)Key word OM (%)y = 0.546x + 44.1 R2 = 0.216 0 20 40 60 80 100 0 20 40 60 80 100 Key word Accuracy (%)Key word OM (%)y = 0.680x + 26.0 R2 = 0.253 Linear fit Mean Mode Linear fit Mean Mode Linear fit Mean Mode Figure 5 The accuracy-intelligibility relationship, with mean and mode intelligibility scores shown for each 10-point accuracy interval.
60 To further examine trends in concentrati ons of the data poin ts, Figure 6 shows the likelihood that, given a data point with a certain accuracy value, rece ivers were able to figure out at least 70% of the message. Th e 70% threshold was chosen for this graph because in educational settings it represents a letter grade of Â“C,Â” the minimum passing grade. The likelihood value was determined by calculating the proportion of all data points in each 10-point accuracy interval that had greater than 70% intelligibility. This likelihood value thus approximates the probability that a receiver obtain s an intelligibility score at or above 70% for any stimulus item in that accuracy interval. As such, the likelihood values provide some indication of the intelligibility m ode as well as the intelligibility mean for each accuracy interval. The likelihood function appears somewhat sigmoidal in shape, showing little change in intelligibility for changes in accuracy at the extr eme ends of the accuracy scale (0-40% accuracy and 75%-100% accuracy) and a steeper slope in the middle of the scale where increases in accuracy cause more dramatic increases in intelligibility. Moreover, all three accuracy-intelligibility likelihood func tions share characteri stics of the same general shape (with similar sl ope and similar left-right shif ts). However, the three functions appear slightly diffe rent at the higher accuracy va lues, as the accuracy vs. key word OM intelligibility function appears to reach a plateau in intelligibility scores for accuracy values above 60%, while the othe r two functions continue to increase in intelligibility for this accuracy range.
61 0 20 40 60 80 100 0 20 40 60 80 100 Accuracy (%)All word OM (%) All Data >70% >75% >80% 0 20 40 60 80 100 0 20 40 60 80 100 Accuracy (%)Key word OM (%) >70% >75% >80% 0 20 40 60 80 100 0 20 40 60 80 100 Key word Accuracy (%)Key word OM (%) >70% >75% >80% Figure 6. The accuracy-intelligibility relationship, with the proportion of data points that reach 70% or higher intelligibility show n for each 10-point accuracy interval.
62 Individual differences Accuracy-intelligibility functions for individual receivers and transliterators are shown in Appendix C. Figures C1 and C2 show similar accuracyintelligibility functions for all receivers, w ith differences mostly in left-right shift indicating differences in absolute performance levels between receiv ers. Figure C3 and C4 demonstrate that using stimulus items from many individual transliterators was helpful in widening the range of accuracy valu es that could be used to characterize the overall relationship between accuracy and inte lligibility. Because stimulus items from some transliterators were restricted in accu racy range, the full accuracy-intelligibility function could not have been ch aracterized with data from th ese transliterators only. For example, the range of accuracies utilized as stimulus items for CST11 and CST12 was restricted to higher accuracy values only, cau sing a ceiling effect in intelligibility (around 90% for CST11 and 80% for CST12). A limited ra nge in intelligibility data also happens for CST2, CST6, CST9, and CST10. However, th e data for each of these transliterators show neither a floor nor a ceiling effect and demonstrate a positive relationship between accuracy and intelligibility, with each charac terizing a different portion of the overall accuracy-intelligibility psychometric func tion. Together, and with data from the remaining transliterators, a large portion of the accuracy-intelli gibility function was characterized. Effect of experience. In order to examine the effect of experience on the shape of the accuracy-intelligibility psychometric func tions, the three likelihood functions were replotted, with data separated by experience group (veteran, experienced, and novice). As shown in Figure 7, the positive relationship between accuracy and intelligibility is
63 apparent for both the novice and veteran groups but no relationship is apparent for the experienced group. However, th e experienced group consisted of only one transliterator, and number of stimulus items available from any one transliterator is not enough data to characterize the relationship. Although it is not possible to determine what relationship exists between accuracy and inte lligibility for experienced transliterators, sufficient data are available to suggest sim ilarities and differences be tween the novice and veteran groups. In comparing the novice and vetera n accuracy-intelligibil ity functions, their slopes appear very similar, but where 70% accu racy corresponds to 50% intelligibility in the novice function, only 50% accuracy is need ed to achieve 50% intelligibility in the veteran function. Thus, while intelligibility im proves with accuracy at the same rate for both groups, the accuracy-intelligibility function of the novices is shifted to the right in comparison to veterans, indicati ng that task difficulty is hi gher for receiving information from novice transliterators than veteran transliter ators, even when accur acy is controlled.
64 0 20 40 60 80 100 0 20 40 60 80 100 Accuracy (%)All word OM (%)Likelihood of Reception threshold: >70% Novice Experienced Veteran Figure 7. Accuracy-intelligibility likelihood f unctions plotted for each experience category (each showing the proportion of data points that reach 70% or higher intelligibility for a given accuracy interval). Effect of lag time. As predicted, no large effects of lag time on intelligibility were found in this study. Figure 8 shows no clearl y definable differences in the shape of the accuracy-intelligibility function when plotted se parately for different lag time ranges. No difference in task difficulty is shown for th ese lag time ranges, which is evidenced by the lack of left-right shift between most of the f unctions. It is possible that the functions for lag time ranges of 0-1 seconds and 1-2 seconds may be somewhat to the left of the other functions (suggesting th at receiving information from tr ansliterators with shorter lag times may be an easier task than those with longer lag times, even when accuracy is controlled), or that there is a difference in slopes between the lag time ranges graphed
65 (suggesting differences in the amount of vari ability associated with various lag time ranges). However, additional statistical analys es are needed to investigate these possible differences, and it is apparent that the di fferences, if any exist, are small. 0 20 40 60 80 100 0 20 40 60 80 100 Accuracy (%)All word OM (%)Likelihood of Reception threshold: >70% 01 sec 12 sec 23 sec 34 sec 45 sec >5.0 sec Figure 8. Accuracy-intelligibility lik elihood functions plotted for each 1-second lag time range (each showing the proportion of data point s that reach 70% or higher intelligibility for a given accuracy interval). Lag Time-Intelligibility Function Finally, the relationship between lag time and intelligibility was investigated by plotting lag time as the independent variable and OM all word intelligibility score as the dependent variable for each stimulus item in order to construct a scatterplot of the lag time-intelligibility relationship. A weak, negative relationship was observed, and the strength of the relationship between the two variables was measured by conducting a
66 SpearmanÂ’s rank order correlation. However, no statistically signi ficant correlation was found (SpearmanÂ’s rho value of -0.041, p=0.319). Although no linear relationship was found, the density of data points suggested a curvilinear relationship. To investigate this possibi lity, the likelihood function, representing the likelihood that stimulus items have an inte lligibility score greater than 70%, was again calculated (see Figure 9). The general shape of the lag time-intelligibility function shows that intelligibility scores of greater than 70% were most likely to occur when lag times were between 1 and 1.5 seconds. Of the stimulus items with lag times in this range, 70% to 75% were associated with intelligibility scores of greater than 70%; this range can therefore be considered an op timal lag time range for CS transliterators. As lag time increased beyond this range, the decl ine in intelligibility was somewhat steep for lag times between 1.5 and 2 seconds (decreasing from 75% likelihood that the intelligibility cutoff is achieved in the optimal lag time range to only 55% likelihood when the lag time is 2 seconds). Furthe r increases in lag times above 2 seconds, however, showed a more gradua l decline in intel ligibility scores (u ltimately dropping as low as 45% likelihood at 4 second lag times). The steepest decline in intelligibility occurred for lag times that were shorter than those in the optimal lag time range. Stimulus items with lag times of less than 1 second had th e lowest likelihood of reaching the intelligibili ty threshold (only 35% of stimulus items with lag time below 1 second had intelligibility scores of 70% or greater, comp ared to 70% of stimulus items with 1 second lag times, from the optimal lag time range).
67 0 2 4 6 8 0 20 40 60 80 100 Lag Time (sec)All word OM (%)Lag Time >70% >75% >80% Figure 9. Lag time-intelligibility likelihood f unctions, each showing the proportion of data points that reach 70% or higher intellig ibility scores for each 0.5-second lag time range.
68 Chapter Five Discussion The results of this study show that messa ge intelligibility for typical Cued Speech transliterators is 72% of all words in the original message, on average, when conveying educational materials designed for high school settings. Intelligibility is higher for key words in the original message (77% on av erage) and highest for key words in the transliterated message (82% on average). Yet even the highest intelligibility score (TMkey word) for the receivers who obtained th e highest intelligibility (CS01 and CS03), 89%, was less than the analogous measure obtained from normal hearing listeners (when presented with audio stimulus items of 100% accuracy). The intelligibility of key words in the original spoken message was 99.8% for monolingual listeners and 99% for bilingual listeners (Tope, 2008). To unders tand the causes of this intelligibility gap between normal hearing listeners and expert cue receivers, it is first necessary to identify the factors affecting message intelligibility of Cued Speech transliterators and quantify their contributions to intelligibility. In this study, the primary f actor under investigation for its effect on intelligibility was accuracy. As hypothesized, the relationshi p between accuracy and intelligibility was positive and nonlinear. Plots of accuracy-i ntelligibility psychometric functions constructed from the data collected showed that as accuracy increased, intelligibility increased. The same accuracy-intelligibility relationship was generally exhibited when plotted by receiver and by transliterator, t hough there were individual differences in the
69 left-right shift of the resulti ng psychometric functions. A second hypothesis of this paper was that th e shape of the accur acy-intelligibility functions would be affected by experience, with flatter slopes and left-shifts in the functions for veteran transliterators as co mpared to novices. This hypothesis proved partially true for Cued Speech transliterator s. The psychometric functions of veteran transliterators were shifted to the left, as compared with novices, but there was no significant difference in the slopes of the f unctions for either group. Unlike the novices and veterans, the Â“experienced Â” category showed no relatio nship between accuracy and intelligibility. With only one transliterator (CST8) categorized as Â“experiencedÂ” rather than as a Â“noviceÂ” or Â“veteran,Â” however, it is unknown whether s/he is typical of the middle experience category or an outlier. Data from more transliter ators in this category will be necessary before the accuracy-intelligibility relationship for experienced transliterators can be determined. Another hypothesis of this paper was that the effect of lag time on the accuracyintelligibility psychometric function would be minimal (because lag time changes are more likely to affect accuracy alone, rather than the relationship between accuracy and intelligibility). As predicted, no effect of lag time was found on the accuracyintelligibility psychometric function. However, lag time (the amount of time the transliterator lags behind the speaker) was an independent factor associated with changes in transliterator message intelligibility. Sp ecifically, the psychometric function relating lag time and intelligibility indicated an Â“op timalÂ” lag time range (where intelligibility scores greater than 70% occurred most frequently) for lag times between 1 and 1.5
70 seconds. Given that lag time changes are most likely to affect accuracy, this range of lag time values is expected to be associated with higher accuracy values as well. Finally, it was hypothesized that even when experience and lag time were controlled, variability due to ot her factors would be evident in the psychometric function. As such, two stimulus items with the same accuracy could have different intelligibility scores, due to differences in cueing rate, sp eechreadability, prosody, or other factors. This hypothesis was true: for any given accuracy value, there was a large range in the resulting intelligibility values, demonstrati ng a high degree of variability unaccounted for by accuracy alone. Other Sources of Variability Because accuracy only accounted for appr oximately 26% of the variability in intelligibility, it is apparent that other factors al so play a role in intelligibility. Many such factors are likely to stem eith er from differences in transl iterators or differences in receivers. Therefore, possibl e sources of both transliterat or variability and receiver variability merit further consideration. Sour ces of transliterator va riability include any variability in intelligibility that is due to subtle differences in behavior, style, or performance of transliterators (i.e. between two transliterator s or even within a single transliteratorÂ’s performance). Sources of receiver variability, on the other hand, include the variability in intelligibility that occurs due to receiver-specifi c factors that affect message reception either between two receiver s (e.g., receptive cueing fluency), or within a receiver (e.g., attention).
71 Transliterator variability. Possible sources of trans literator variability are numerous and include the tr ansliteratorÂ’s cueing mech anics (handshape formation, placements, and transitional cueing movement s), speechreadability, error types, facial expressions, cueing rate, s ynchronization between mouth and cues, and prosody (i.e. timing and emphasis). Other sources of tr ansliterator variability may include intratransliterator factors (e.g. fatigue, attenti on, nervousness), differences in cue selection (e.g. cueing unstressed /i/, as in Â“funnyÂ” as /i/ versus /I/), and dialect (dialect may be apparent in selection of cues as well as mouth movements). In this study, some of thes e factors obviously played a role in transliterator intelligibility, at least in the cases of the four transliterators whose intelligibility did not closely follow accuracy. CST11, for example, ranked fourth in accuracy at 73%, but achieved the highest intelligibi lity (90%) of any translitera tor, surpassing even CST12, the transliterator with the highest accuracy (90%), whose intelligibility was 87%. It should, however, be noted that the accuracy difference between these two transliterators may be somewhat exaggerated. The reason fo r this is that the accuracy measurements were completed on a strict grading system, w ith no credit given for substitutions, even when part of the target cue (handshape or placement) was correctly produced. While CST12 produced few substitutions, CST11 ofte n produced errors of substitution that resulted from a tendency to Â“hypocueÂ” (appr oximate, but not fully achieve the intended handshape or placement). As such, CST 11 was given no credit for 22% of his/her performance due to errors of substitution. If these substitutions were each partially correct, CST11 would have been deserving of an accuracy score of 84%, instead of 73%
72 under the strict grading system. Although su ch a scenario would account for part of CST11Â’s higher intelligibility, it sti ll does not explain why CST11 would be more intelligible that CST12, whose accuracy was 90% Therefore, CST11 must have achieved high intelligibility results by cap italizing on other skills. A close inspection of CST11Â’s performan ce reveals several strengths which may have played a role in increased intelligibility : 1) highly visible/clea r speech, 2) excellent facial expressions and other non-manual inform ation (use of eyebrows, head leaning, and showing lists on the hand) to show questi ons, emphasize important points, and convey the tone of the message, and 3) effective use of available time while still keeping up with the message (i.e., CST11 capitalized on speaker pauses, slowing down slightly in order to better show syllable, word, or sentence stress) While CST12 demons trated visible/clear speech and facial expressions, he/she did not capitalize on speaker pauses, but instead followed immediately behind the spoken messa ges, demonstrating pauses equal to speaker pauses. As a result, the message prosody and word stress were less pronounced, and the cueing rate was effectively faster. Because of this difference, CST11 had a longer average lag time (1.4 seconds) than CST 12 (1.1 seconds, a lag time average that is at the lower end of the optimal lag time range of 1 to 1.5 seconds). It is therefore possible that CST11 ranked highest in intelligibility and was unanimously identified as a Â“highly effectiveÂ” CST by all eight receivers because of better message prosody, slower effective cueing rate, and/or more optimal lag time. Two other transliterators whose perfor mance may lend insight into sources of transliterator variability are CST1 and CST3, the two novice transliterators whose
73 intelligibility was worse than expected given their accuracy scores. While veteran transliterators generally achieved intelligibility scores that were significantly higher than their accuracy averages, the in telligibility of these two tr ansliterators was below their accuracy averages, with CST1 averaging 68% accuracy and 66% intelligibility and CST3 averaging 81% accuracy and 76% intelligibility. Aside from experience level, specific factors noted by the experimenter that we re likely to have negatively impacted intelligibility for both of these two transliter ators include slow cueing rate, misleading facial expressions (con centrating, confused, or discouraged facial expressions), and poor timing and rhythm within words and senten ces (extraneous pausi ng, poor demonstration of word or syllable emphasis, and poor c onveyance of the importance of words). The slow cueing rate of both novi ces resulted in average lag times much longer than the optimal lag time range of 1 to 1.5 seconds (CST1 had an average lag time of 3.4 seconds and CST3 had an average lag time of 3.5 seconds ). Additionally, it is possible that the cueing rates were too slow at times and may have caused receivers to have difficulty keeping words in working memory. Between CST1 and CST3, CST1 generally exhibited clearer visible speech, while CST3 was slightly less clear, held his/her hand at an atypical angle and had many false starts. Combined w ith extraneous pausing, these false starts resulted in a misleading rhyt hm that may have obscured word boundaries (in one case, a receiver perceived Â“surgically preciseÂ” as Â“s urge eucalyptusÂ”). All or most of these transliteration behaviors are like ly to reflect the novice transl iteratorsÂ’ inability to cope with the cognitive load and/or physical dema nds of the task of transliterating at a conversational speed. More practice transliter ating at faster speed s and/or experience
74 would likely improve the intelligibility scores of these two novices. Of course, additional experience and practice alone would not necessa rily improve all of the above factors, unless more training was also provided to raise awareness of these issues. Finally, discrepancies between the intelligib ility score and receiver ratings for one transliterator (CST8) may also be an indication of additiona l sources of transliterator variability. Based on re ceiver ratings, CST8 ranked last (t ied with CST10; both received a composite rating of 0 on a scale of 0 to 10) and was unanimously id entified as a Â“highly ineffectiveÂ” transliterator, despite achieving 74 % intelligibility and ra nking 9th out of the 12 transliterators in intelligibility. Seve ral transliteration behaviors were noted as possible factors that may explain why the receive r impressions for this transliterator were poorer than would be expected based on intelligibility alone. First, CST8 often did not show visibly discernable placements as his/her hand consistently remained in front of the chin (for side, throat, and chin cues) during a majority of the translite ration, regardless of the consonant or consonant-vowel combina tion being produced. Ma ny of the receivers complained about this aspect of CST8Â’s cu eing, both on the survey and in conversations at the breaks during the experiment. Second, CST8 regularly used unusual and sometimes misleading mouthshapes, freque ntly producing lip rounding in words that should not contain it (for example, Â“slight estÂ”). Next, CST8 exhibited poorer synchronization between mouth and cues th an other translitera tors, which became apparent when his/her performance was edited into individual sent ence video clips. CST8 was the transliterator with the highest number of combined clips due to poor break points between phrases; frequently, the transl iteratorÂ’s mouth was still articulating the
75 sound(s) of the previous senten ce while beginning to cue a word from the next sentence or vice versa. Lastly, CST8Â’s cue pacing a nd lack of facial expressions appeared less effective at conveying the natural rhythm and st ress of speech than other transliterators. Although CST8 still produced a message that was 74% intelligible overall, receivers were more aware of CST8Â’s issues (or more bothered by them), than they were for two other transliterators with lower accu racy: CST1 had only 66% intelligibility and CST4 had only 54% intelligibility, but both were only identified as Â“highly ineffectiveÂ” transliterators by half the group, whereas CS T8 was unanimously chosen by all eight receivers as Â“highly ineffective.Â” This diffe rence in ratings, which can not be attributed to differences in intelligibility, suggests eith er a conscious level of awareness of CST8Â’s shortcomings or a general frustration on the part of the receiver when viewing CST8. While the other two transliterators were less inte lligible, it is likely that the receiver was sure about what cues these transliterators pr oduced, whether the cues were correct or in error. However, viewing CST8Â’s transliter ation likely left the receivers with more uncertainty regarding what cues had been produced, causing them frustration and/or resulting in a higher cognitive load as they tr ied to determine what they were supposed to receive from CST8. Receiver variability Possible sources of receiver va riability include the receiverÂ’s experience and comfort level w ith receptive cueing and as we ll as processing strategies used in cue reading. Because deaf indivi duals encounter more diverse communicative environments than do hearing individuals, th ese individuals are likely to have less consistent exposure to their chosen communi cation modes than hearing individuals.
76 While at least one participant (CS01) re ported using cueing all the time with most members of his/her family, seve ral of the receivers in this study reported that they are now (in their adult lives) primarily in noncueing environments, either with hearing people who do not cue or with deaf people in settings where sign language is the primary mode of communication (for work or with d eaf friends/spouses who do not cue). At least one participant (CS03) reported feeling Â“rus tyÂ” with receptive Cued Speech, saying s/he relied heavily on speechreading during the testin g. When receivers are less experienced with receptive cueing or have with reduced comf ort levels due to lack of recent use, more emphasis may need to be placed on clearer sp eech movements (to aid in speechreading), slower rate, and possibly ot her stylistic differences. Regardless of comfort level, there may also be differences between cue receivers in processing strategies used for cue readi ng. Some receivers may rely more heavily on information from the lips (and prefer transl iterators who exhibit a high degree of lip clarity), while others may rely more heavily on information from the cues (and prefer transliterators who exhibit a high degree of cue clarity). The former group would be expected to make errors more frequently that are consistent with the lips but incongruent with the cues produced, while the latter group would be expected to do the reverse (decoding the cues with less reliance on mouth clarity, more frequently making errors that are consistent with the cues but inc ongruent with the mouthshapes produced). Of course, for all receivers, some of both error types may occur, and both types of errors were indeed noted in this study. For exam ple, CS01 perceived the word Â“ficusÂ” (5s5t2t3s) as Â“fightsÂ” (5s-5t5s3s ). Both words are similar visually based on mouthshapes
77 alone, but the cues distinguish the words; thus, CS01 followed the lips more than the cues, resulting in this error. On the othe r hand, CS03 followed the cues, rather than the lips, when perceiving the word Â“lifeÂ” (cued 6s-5 t5s) as Â“lightÂ” (also cued 6s-5t5s) Â– these words are cued identically, but it is not difficu lt to tell them apart through speechreading. Although both types of errors were observed, a cursory inspection of receiver responses suggests that the overwhelming majority of the errors made by the eight receivers in this study were such that responses tended to co rrespond more to the lips than the cues. However, generalizations about the receiversÂ’ processing strate gies are difficult to make because errors in cue production (made by the transliterator) may have influenced receiver errors, even for receivers who use a processing strategy that typically relies more heavily on cues than on mouthshapes (analy zing cue-by-cue accuracy data to determine the type of translitera tor errors made when words were not transliterated with 100% accuracy and aligning this information with receiver responses was beyond the scope of the present study). Role of Training and Certification Because a wide range of performances (in accuracy and in resulting intelligibility) was exhibited for the transliterators in the veteran category, a closer examination of the experience backgrounds of the veterans is warr anted. Nine of the twelve transliterators recruited for this study qualified as Â“veteranÂ” transliterators (defined as transliterators with the highest level of certification and/or more than 5 years work experience), and Table 9 illustrates that there were substantial differences between these individuals with
78 respect to their years of transliterating experience, curre nt weekly hours transliterating, and relevant certifications. Table 9 Accuracy, Intelligibility, and Experience Profiles for Veteran CSTs, Ranked by Intelligibility Work Experience as a CST Hours Per Week Transliterating Relevant Certifications Accuracy (%) Intelligibility (%) CST11 6 yrs. 5 hrs. None 73 90 CST7 9 yrs. 30 hrs. Nationally Certified (TSC) 86 87 CST12 20 yrs. 3 hrs. Nationally Certified (TSC) 90 87 CST6 15 yrs. 30 hrs. Educational CST Certified by State 71 86 CST5 15 yrs. 35 hrs. Educational CST Certified by State 69 77 CST9 15 yrs. 32.5 hrs. None 59 76 CST2 10 yrs. 35 hrs. Educational CST Certified by State 67 75 CST4 22 yrs. 35 hrs. Educational CST Certified by State 40 54 CST10 18 yrs. 35 hrs. None 47 52 Although no one experience factor alone appears to explain differences in intelligibility, several important trends are evident. First, the two veteran transliterators
79 with the fewest years of experience (CST 7 and CST11) were among the highest in intelligibility, while the two veteran transliterat ors with the lowest in telligibilities (CST4 and CST10) were two of the three most-expe rienced veterans. On e explanation of the intelligibility differences between these groups of transliterators is that training methods may have improved over the last 20 years. This explanation seems likely considering that the first formal Cued Speech transliter ator training classes we re not offered until 1985 (Krause, Schick, & Kegl, in press). If the training that tran sliterators received approximately 20 years ago was indeed more varied and continuing skill development training unavailable, this woul d also explain why the three veterans with the highest number of years experience, varied from among the most intelligible (CST12) to among the least intelligible (CST4, CST10). Another factor which may explain intell igibility differences between the three highly experienced veterans is the level of certification obtained. While CST10 has no certification and CST4 was certified as an e ducational CST by the st ate (both of whom obtained the lowest intelligibility scores), CST 12 (who was highly accurate and highly intelligible) has national certification. In fact, two of the three most intelligible transliterators have obtained national certification, while those with state certification varied greatly in respect to their accuracy a nd intelligibility scores. Standards for state certification are more varying and can be le ss rigorous than nati onal certification, with many transliterators automatically granted st ate certification based on their number of years of experience (under a stateÂ’s Â“Grandfathe r ClauseÂ”). As a result, the difference in requirements for state versus nationa l certification may be great.
80 While differences in certification may expl ain the difference in intelligibility for the transliterators with the most years of experience, certification status alone does not always predict intelligibility. Both CST11 and CST12 obtained the highest intelligibility scores of all nine veterans; however, CST11 has no certification, while CST12 is nationally certified. Since CST training methods have changed over time, perhaps CST11Â’s initial training focused on additional fa ctors that affect intelligibility, such as facial expression, synchronization, or lip move ment, where it is possible that CST12Â’s training did not. However, it is wort h noting that although CST11 scored high intelligibility, he/she did not score nearly as high in accuracy. This difference in accuracy may be related to the lack of certification by CST11and suggests that transliterator evaluation pro cesses should be focused on intelligibility, rather than accuracy, particularly considering that it is unknown whether or not intelligibility differences exist in translit erators due to differences in physical speed, language, or speech skills (all variables that are difficult for tests to measure). Conclusions Accuracy plays a large role in intelligib ility, but there are ma ny other factors that affect transliterator message intelligibility. S ources of transliterator variability point to a number of factors (e.g. visual speech clarity, facial expres sion, non-manual makers, and cueing rate) that caused some transliterators to be higher or lower in intelligibility than would be predicted by accuracy-intelligibility relationship. Sources of receiver variability (e.g. current comfort level with receptive cueing a nd processing strategies for
81 cue reading, with heavier reliance on either lips or cues) also caused some receivers to perform differently, obtaining higher or lower intelligibility scores given stimulus items of similar accuracy. While the field of inte rpreting cannot control for differences in receiver performance, interpreting sta ndards can be intro duced to reduce sources of variability between transliterators and improve overall intelligibility. In general, greater transliterator experience was found to have a positive effect on intelligibility. When comparing the performance on novices and vetera ns, the accuracy-intelligibility functions shifted to the left for veterans in compared to novices, given that higher intelligibility scores were obtained with lower accuracy scor es for veterans than for novices. However, even when experience is controlled, much of the variance in intelligibility remains unexplained by accuracy (44% for novices, 72% fo r veterans). Therefore, it is important to isolate and quantify the contribution of other factors, such as those identified here (speechreadability, facial expressings, cueing rate, etc.), in future studies. Future Work While the use of 12 transliterators was sufficient for characterizing the accuracyintelligibility relationship, it cannot be assumed that the results of these 12 individuals are representative of the performances of all Cued Speech transliterators. Therefore, future studies should expand the number of transliterators included. Special attention should be paid to recruit more transliterators wh o will qualify as Â“noviceÂ” and Â“experiencedÂ” transliterators, as there were only two novices and one experienced transliterator for the
82 current study. Attention should also be pa id to recruiting more nationally certified transliterators to ensure that the two nationally certified transliterators utilized in this study are representative of the skills of other nationally cert ified transliterators. Because a majority of the transliterators employed for this study were from a limited number of states, more transliterators with state certification from a vari ety of states should also be recruited in future studies. The inclusion of transliterators certified by other states would provide information regarding the variability of state standards. Because differences in transliterator cuei ng rate were thought to be a factor in the intelligibility results of this study, future resear ch is needed to quantify the effect of rate on message intelligibility. Without even recrui ting more participants, an Â“optimal cueing rateÂ” could be determined based on the existi ng data if cueing rate measurements were made for each stimulus item and correlated with the intelligibility re sults of this study. Future intelligibility experiments should al so be constructed to draw from the full database of video clips (containing these 12 CSTs transliterating at slow, normal, and fast presentation rates). The effect of presen tation rate could then be investigated by determining psychometric functions (anal ogous to those found in this study for intelligibility and accuracy, experience, and la g time) at each of the three presentation rates. Finally, it is important to conduct sim ilar experiments for communication modes other than Cued Speech. The quantitative analys is of the factors affecting intelligibility of transliterator messages is especially im portant for any communication mode utilized by deaf students in educational settings. This study is part of a larger study by Krause
83 (2006) that aims to complete similar intel ligibility experiments for other communication modes, including Signed Exact English (SEE II), Conceptually Accurate Signed English (CASE, also sometimes referred to as Pidgin Signed English), and eventually American Sign Language (ASL).
84 References Beasley, D. S., Bratt, G. W., & Rintelma nn, W. F. (1980). Intelligibility of timecompressed sentential stimuli. Journal of Speech and Hearing Research, 23, 722731. Clarke, B. R., & Ling, D. (1976). The effects of using Cued Speech: A follow-up study. The Volta Review, 78, 23-34. Cokely, D. (1986). The effects of lag time on interpreter errors. Sign Language Studies 53 341-375. Cornett, R. O. (1967). Cued Speech. American Annals of the Deaf, 112, 3-13. Films for the Humanities (Producer). (1989). The Life Cycle of Plants [Film]. (Available from Films Media Group, P.O. Box 2053, Princeton, NJ 08543-2053) Fischer, S. D., Delhorne, L. A., & Reed C. M. (1999). Effects of rate of presentation on the reception of American Sign Language. Journal of Speech, Language, and Hearing Research, 42, 568-582. Gustason, G. (1990). Â“Signing Exact English.Â” In Manual communication: Implications for education (p. 108-127). H. Bornstein. Ed. Washington, DC: Gallaudet University Press. Hammil, D. D., Brown, V. L., Larsen, S. C., & Wiederholt, J. L. (1994). Test of adolescent and adult language (Third edition). Austin, TX: Pro-Ed. Husaim, D., & Tessler, M. (2006). The effect of speaking rate, expe rience, and lag time on Cued Speech transliterato r accuracy. Unpublished honorÂ’s thesis, University of South Florida, Tampa, Florida. Jones, B. E., Clark, G. M., & Soltz, D. F. (1997). Characteristics and practices of sign language interpreters in in clusive education programs. Exceptional Children, 63 (2), 257-268. Kluwin, T. N., & Stewart, D. A. (2001). In terpreting in schools, a look at research. Odyssey Winter/Spring, 15-17. Krause, J.C. (2006). Personal communication.
85 Krause, J.C., Schick, B., & Kegl, J.A. (i n press). Â“A version of the Educational Interpreter Performance Assessment for Cu ed Speech transliterators: Prospects and Significance. In Cued Speech and Cued Language Development of Deaf Students (Chapter 15). San Diego, CA : Plural Publishing, Inc. LaSasso, C. J., & Metzger, M.A. (1998). An alternate route for preparing deaf children for BiBi programs: The home language as L1 and Cued Speech for conveying traditionally spoken languages. Journal of Deaf Studies and Deaf Education, 3, 264-289. Leybaert, J. & Charlier, B. L. (1996). Visual speech in th e head: the effect of Cued Speech on rhyming, remembering, and spelling. Journal of Deaf Studies and Deaf Education, 1, 234-248. Magner, M.E. (1972). A speech intelligibility test for deaf children. Northhampton, MA: Clarke School for the Deaf. Marschark, M., Sapere, P., Convertino, C., & Seewagen, R. (2005). Access to postsecondary education through sign language interpreting. Journal of Deaf Studies and Deaf Education, 10 (1), 38-50. Miller, G.A. (1956) The magical number seve n plus or minus two: Some limitations on our capacity for processing information. Psychological Review, 63 81-97. Nicholls, G. H., & Ling, D. (1982). Cued Sp eech and the reception of spoken language. Journal of Speech and Hearing Research, 25, 262-269. Park, W.E. (2005). Pers onal communication. Pelley, K.A., Husaim, D., Tessler M., Lindsay, J., & Krause, J.C. (2006). The effect of speaking rate and experience on Cued Speech Transliterator accuracy. Unpublished Poster Session, ASHA, Miami, Florida. Reed, C. M., Delhorne, L.A., Durlach, N. I ., & Fischer, S. D. (1990). A study of the tactual and visual reception of fingerspelling. Journal of Speech and Hearing Research, 33, 786-797. Scheetz, N. A. (2001). Orientation to Deafness (Second Edition). Needham Heights, Massachusetts: A llyn and Bacon. Schick, B., Williams, K., & Bolster, L. (1999) Skill levels of edu cational interpreters working in public schools. Journal of Deaf Studies and Deaf Education, 4 (2), 145-155.
86 Schick, B. & Williams, K. (2001). Evaluating interpreters who work with children. Odyssey, Winter/Spring, 12-14. Schick, B., Williams, K., & Kupermintz, H. (2006). Look whoÂ’s being left behind: Educational interpreters and access to education for deaf and hard-of-hearing students. Journal of Deaf Studies and Deaf Education, 11 (1), 3-20. Smart, J. (2007). Personal Communication. Strong, M. & Rudser, S. F. (1985). An assessment instrument for sign language interpreters. Sign Language Studies, 49 (Winter), 343-362. Strong, M. & Rudser, S. F. (1986). The subjective assessment of sign language interpreters. Sign Language Studies, 53 (Winter), 299-313. Tessler, M. (2007). Pers onal Communication. Tope (2008). The Effect of Bilingualism on L2 Speech Perception. Unpublished Undergraduate Honors thesis. University of South Florida; Tampa, Florida. Traxler, C. B. (2000). The Stanford Achiev ement Test, ninth edit ion: National norming and performance standards for deaf and hard-of-hearing students. Journal of Deaf Studies and Deaf Education, 5:4 (Fall), 337-348. Wandel, J. E. (1989). Use of internal speech in reading by hearing and hearing impaired students in Oral, Total Communication, and Cued Speech programs. Unpublished doctoral dissertation, Columbia University, New York. Wilson, R. H. & Strouse, A. L. (1999). Â“A uditory measures with speech signals,Â” Contemporary Perspectives in Hearing Assessment. Needham Heights, MA. Allyn & Bacon. Winston, E. (1989). Â“Translitera tion: WhatÂ’s the message?Â” The Sociolinguistics of the Deaf Community. C. Lucas Ed. San Diego, CA: Academic Press.
88 Appendix A: Participant Information Table A1 Cued Speech Receivers CS01 CS02 CS03 CS04 CS05 CS06 CS07 CS08 Age 27 30 39 34 25 28 20 33 Gender M F M F M F F F Education B.A. Some college B.A. PhD Some college Some college Some college B.A. Hearing level Profound ProfoundProfound Severe to Profound ProfoundProfound Profound Profound CS Screening 100% 100% 100% 100% 100% 100% 100% 95% TOAL-3 Percentile Score 95 95 98 84 75 75 84 75 First Language English English English English and French English English English and ASL English Age first exposed to CS 2 5.5 3 2 2 10 months 1 3 CS use Home Y Limited Y Y Y Y Y Y CS use School Y Y Y Limited Y Y Y Y # Yrs CS 25 24 36 32 23 27 19 30 Pref. Comm. Mode English (oral and cued) No response English (oral and cued) Spoken English ASL Cued Speech Cued Speech Cued Speech/ English Cued Speech and Sign Language Other Lang. fluent Signed English None None ASL ASL Sign Language Spanish None Age exposed (other lang.) 18 N/A N/A 17 17 19 12 N/A
89 Appendix A (Continued) Table A2 Background and Scores (given 100% Accurate Stimuli) for Normal Hearing Listeners (Tope, 2008) Bilingual? Education TOAL-3 Percentile Score All Word Intelligibility (%) Key Word Intelligibility (%) BL01 Y M.A. 95 98.66 99.48 BL02 Y Some College84 93.23 96.62 BL03 Y Ph.D. 84 97.65 99.22 BL04 Y B.A. 84 99.26 99.87 BL05 Y Some College84 99.46 99.87 BL06 Y B.A. 91 98.72 99.48 BL07 Y Some College63 95.50 97.92 BL08 Y B.A. 98 99.53 99.74 ML01 N Some College95 99.40 99.87 ML02 N Some College84 99.19 99.87 ML03 N B.A. 95 99.66 100.00 ML04 N Some College75 99.06 99.48
90 Appendix B: Number of clips per CST Ideal Block 1 Block 2 Block 3 Block 4 CST1 19 21 19 19 17 CST2 19 16 14 21 16 CST3 19 20 16 13 16 CST4 19 23 22 25 22 CST5 19 19 21 20 20 CST6 19 17 19 18 16 CST7 19 17 17 14 16 CST8 19 21 25 21 26 CST9 19 21 19 20 19 CST10 19 18 25 24 25 CST11 19 20 21 17 20 CST12 19 18 13 19 18
91 Appendix C: Individual Data 0 20 40 60 80 100 0 20 40 60 80 100 Accuracy (%)All word OM (%) CS01 0 20 40 60 80 100 0 20 40 60 80 100 Accuracy (%)All word OM (%) CS02 >70% >75% >80% 0 20 40 60 80 100 0 20 40 60 80 100 Accuracy (%)All word OM (%)CS03 0 20 40 60 80 100 0 20 40 60 80 100 Accuracy (%)All word OM (%)CS04 Figure C1. Accuracy-intelligibility likelihood functions plotted for each expert receiver, CS01 through CS04 (each showing the proportion of data points that reach 70% or higher intelligibility for a given accuracy interval).
92 Appendix C (Continued) 0 20 40 60 80 100 0 20 40 60 80 100 Accuracy (%)All word OM (%) CS05 0 20 40 60 80 100 0 20 40 60 80 100 Accuracy (%)All word OM (%) CS06 >70% >75% >80% 0 20 40 60 80 100 0 20 40 60 80 100 Accuracy (%)All word OM (%) CS07 0 20 40 60 80 100 0 20 40 60 80 100 Accuracy (%)All word OM (%) CS08 Figure C2. Accuracy-intelligibility likelihood functions plotted for each expert receiver, CS05 through CS08 (each showing the proportion of data points that reach 70% or higher intelligibility for a given accuracy interval).
93 Appendix C (Continued) Figure C3. Accuracy-intelligibility likelihood functions plotted for each transliterator, CST1 through CST6 (each showing the proporti on of data points that reach 70% or higher intelligibility for a given accuracy interval). 0 20 40 60 80 100 0 20 40 60 80 100 Accuracy (%)All word OM (%)CST5 0 20 40 60 80 100 0 20 40 60 80 100 Accuracy (%)CST6 0 20 40 60 80 100 0 20 40 60 80 100 Accuracy (%)All word OM (%)CST3 0 20 40 60 80 100 0 20 40 60 80 100 Accuracy (%)CST4 0 20 40 60 80 100 0 20 40 60 80 100 Accuracy (%)All word OM (%)CST1 0 20 40 60 80 100 0 20 40 60 80 100 Accuracy (%)CST2
94 Appendix C (Continued) Figure C4. Accuracy-intelligibility likelihood functions plotted for each transliterator, CST7 through CST12 (each showing the propor tion of data points that reach 70% or higher intelligibility for a given accuracy interval). 0 20 40 60 80 100 0 20 40 60 80 100 Accuracy (%)All word OM (%)CST11 0 20 40 60 80 100 0 20 40 60 80 100 Accuracy (%)CST12 0 20 40 60 80 100 0 20 40 60 80 100 Accuracy (%)All word OM (%)CST7 0 20 40 60 80 100 0 20 40 60 80 100 Accuracy (%)CST8 0 20 40 60 80 100 0 20 40 60 80 100 Accuracy (%)All word OM (%)CST9 0 20 40 60 80 100 0 20 40 60 80 100 Accuracy (%)CST10