xml version 1.0 encoding UTF-8 standalone no
record xmlns http:www.loc.govMARC21slim xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.loc.govstandardsmarcxmlschemaMARC21slim.xsd
leader nam Ka
controlfield tag 001 001920011
007 cr mnu|||uuuuu
008 080102s2007 flu sbm 000 0 eng d
datafield ind1 8 ind2 024
subfield code a E14-SFE0002084
Effects of clear speech and linguistic experience on acoustic characteristics of vowel production
h [electronic resource] /
by Michelle Bianchi.
[Tampa, Fla.] :
b University of South Florida,
ABSTRACT: The present study investigated the hypothesis that later and/or early learners of English as a second language may exhibit an exaggerated or restricted degree of change in their production performance between clear and conversational speech styles for certain acoustic cues. Monolingual English talkers (MO), early Spanish-English bilinguals (EB) and late Spanish-English bilinguals (LB) were recorded using both clear and conversational speaking styles. The stimuli consisted of six target vowels /i, I, e, E, ae/ and /a/, embedded in /bVd/ context. All recorded target-word stimuli were isolated into words. Vowel duration was computed, and fundamental frequency (F0), and formant frequency values (F1-F4) were measured at 20%, 50%, and 80% of the vowel duration. Data from the MO and EB talkers indicates that these two groups are very similar in that they emphasize duration differences in clear speech, have similar spacing of vowels (static & dynamic properties), and have similar frequency changes in clear speech. Data from the LB talkers indicates that this group failed to emphasize differences in clear speech, particularly duration differences. In addition, the high-mid front vowels (/i, I, e/ and /E/) were found to be very poorly separated in the F1-F2 space for the LB talkers. In support of the hypothesis, the data showed that LB talkers exhibited a restricted degree of change in their production performance between clear and conversational speech styles for duration, as compared to monolingual talkers. Data analyzed for the EB talkers do not reveal systematic reductions in the degree of change in their production performance between clear and conversational speech styles, as compared to monolingual talkers.
Thesis (M.S.)--University of South Florida, 2007.
Includes bibliographical references.
Text (Electronic thesis) in PDF format.
System requirements: World Wide Web browser and PDF reader.
Mode of access: World Wide Web.
Title from PDF of title page.
Document formatted into pages; contains 67 pages.
Advisor: Catherine L. Rogers, Ph.D.
x Speech Language Pathology
t USF Electronic Theses and Dissertations.
Effects of Clear Speech and Linguistic Experience on Acoustic Characteristics of Vowel Production by Michelle Bianchi A thesis submitted in partial fulfillment of the requirement s for the degree of Master of Science Department of Communication Sciences and Disorders College of Arts and Sciences University of South Florida Major Professor: Catherine L. Rogers, Ph.D. Jean C. Krause, Ph.D. Stefan A. Frisch, Ph.D. Date of Approval: July 17, 2007 Keywords: Spanish vowels, acoustic analysis, spectral change, second language, bilingualism Copyright 2007, Michelle Bianchi
Dedication It is only by the grace of God t hat I have been blessed with the gifts necessary for this accomplishment. If not for His provisions of time, talent, patience, and determination, this work would not have been possible. Therefore, I dedicate this thesis to the O ne who called me to complete it. I thank my amazing husband, Dominick, for his love, support, encouragement, and understanding over the last three ye ars of this journey. I remember standing in the kitchen when you to ld me to go for it. We had no idea how hard it would be, but we both know that good things donÂ’t always come easy I thank my loving children, Trey and Tanner, for always praying for me to get AÂ’s and for all of the times you occ upied yourselves while I wrote. You wonÂ’t have to go to DadÂ’s shop anymore! I thank my wonderful advisor, Dr. R ogers. You have been in my corner from day one when I nervously called you to ask if I could take your class as a non-degree seeking student. You took me under your wing and spoke on my behalf during the admissions process. Y ou always understood the needs of my family and helped me to balance them wit h school. You taught me how to think and shared your gifts and y our life stories with me. YouÂ’ve laughed with me and cried with me. You have inspired me wit h your brilliant mind, and I will never forget you. Thank you for your kin dness. You are an angel from heaven.
Acknowledgments The research in this thesis wa s supported by NIH-NIDCD grant #5R03 DC005561 to Catherine L. Rogers (major professor). I would like to thank Teresa DeMasi for the countless hours she spent finding and running subjects, teaching me, watching my boys, and especially for EDITING! We thought weÂ’d never finish girl! You have been a wonderful friend, and I will miss you terribly. Thank you, Dr. Krause and Dr. Frisch for taking the time to read this thesis and make comments. Your comm ents were very helpful and much appreciated. Thank you Merete Mller Glasbrenner for the foundation you laid for my work, for making me feel welcome in the lab, and for making me laugh. Thank you, Dr. Diehl, for helping me with the overall organization of my thesis when I began writing. The time y ou invested in me was much appreciated. Thank you, Mary, Jen, and Barb, fo r taking care of my boys. I never worried when they were with you. You gave me peace of mind so I could focus. I love you all.
i Table of Contents List of Tables iii List of Figures iv Abstract v Chapter One Introduction 1 Overview and Statement of the Problem 1 Linguistic Experience 6 The Speech Learning Model 6 Vowel Inventories of Spanish and E nglish and Predictions of the SLM 8 Acoustic Properties of Vo wels and Studies of Vowels Produced by L2 Learners 9 Acoustic Properties of Vowels 9 Vowels Produced by L2 Learners 12 Clear versus Conversational Speech 17 Purpose of the Present Study 22 Chapter Two Method 24 Inclusion and Exclusion Criteria 24 Participants 25 Materials 27 Recording Procedure 28 Editing Procedure 31 Settings for Acoustic Analysis 32 Vowel Duration Measurement 33 Frequency Measurements 35 Chapter Three Results 39 Vowel Duration 40 F1 at 50% of Vowel Duration 43 F2 at 50% of Vowel Duration 49 Length of Vector from 20% to 80% of Vowel Duration 52 Chapter Four Discussion 57 Summary of Results 57 Comparisons to Previous Studies 59 Limitations and Implications for Future Research 61
ii Conclusion 61 References 63
iii List of Tables Table 1 Demographic data for early bilingual talkers 26 Table 2 Demographic data for late bilingual talkers 27 Table 3 Statistical results for vowel duration 40 Table 4 Statistical results for F1 at 50% of vowel duration 44 Table 5 Statistical results for F2 at 50% of vowel duration 50 Table 6 Statistical results fo r two-point vector length 53
iv List of Figures Figure 1. Average durations (in ms) of target vowels for words produced in conversational and clear speech styles. 41 Figure 2. Average steady-state (50% of vowel duration) F1 and F2 frequencies (in Barks) for vowels in conversational and clear speech (MO and EB talkers). 45 Figure 3. Average steady-state (50% of vowel duration) F1 and F2 frequencies (in Barks) for vowels in conversational and clear speech (MO and LB talkers). 46 Figure 4. Average F1 and F2 frequencie s (in Barks) at 20% and 80% of vowel duration for vowels in conversational (black arrows) and clear (gray arrows) speech (MO and EB talkers). 54 Figure 5. Average F1 and F2 frequencie s (in Barks) at 20% and 80% of vowel duration for vowels in conversational (black arrows) and clear (gray arrows) speec h (MO and LB talkers). 55
v Effects of Clear Speech and Linguistic Experience on Acoustic Characteristics of Vowel Production Michelle Bianchi ABSTRACT The present study investigated the hypothesis that later and/or early learners of English as a second la nguage may exhibit an exaggerated or restricted degree of change in their pr oduction performance between clear and conversational speech styles for certai n acoustic cues. M onolingual English talkers (MO), early Spanish-English bilin guals (EB) and late Spanish-English bilinguals (LB) were recorded using both clear and conversational speaking styles. The stimuli consisted of six target vowels / i, e, / and / /, embedded in /bVd/ context. All recorded target-word stimuli were isolated into words. Vowel duration was computed, and fundamental frequency (F0), and formant frequency values (F1-F4) were measured at 20%, 50%, and 80% of the vowel duration. Data from the MO and EB talkers indi cates that these two groups are very similar in that they emphasize duration di fferences in clear speech, have similar spacing of vowels (static & dynamic properties), and have similar frequency changes in clear speech. Data from the LB talkers indicates that this group failed to emphasize differences in clear speech, particularly duration differences. In addition, the high-mid front vowels (/i, I, e/ and / /) were found to be very poorly
vi separated in the F1-F2 space for the LB ta lkers. In support of the hypothesis, the data showed that LB talkers exhibit ed a restricted degree of change in their production performance between clear and c onversational speech styles for duration, as compared to monolingual talk ers. Data analyzed for the EB talkers do not reveal systematic reductions in the degree of change in their production performance between clear and conversatio nal speech styles, as compared to monolingual talkers.
1 Chapter One Introduction Overview and Statement of the Problem Non-native speakers of English living in the United States must learn to adapt to many environmental challenges in speaking conditions if they are to be as well understood as native talkers in their daily lives. Some of the environmental challenges that occur quite frequently are background noise, reverberation and the filter ing that occurs in telephone communication. All of these factors have been shown to affect inte lligibility of nativ e talkers (Payton, Uchanski, & Braida, 1994; Bradlow & Al exander, 2007; Bradlow & Bent, 2002) and native talkers have been shown to dev elop speaking strategies that can partially overcome these challenges (Bradlow & Alexander, 2007; Ferguson, 2004). Yet the development of speech sound production abilities across different speaking conditions by adult Spanish speakers of English has received relatively little investigation. Each phoneme is identif ied by listeners by a range of speech cues and differences on any of these can re sult in a detectible foreign accent and may impede communication in difficult environments. The rapid growth of Spanish spea king bilinguals in the United States (approximately 28 million per sons at the 2000 Census; United States Census
2 Bureau, 2000) has given rise to the need for research in the area of speech production by this population. As evidenc ed by the recent growth of accent modification therapy by speech-la nguage pathologists, second-language (L2) learners are eager to learn native-like pr onunciation. Non-native speakers of English often have difficulty being understood. Under difficult speaking conditions, intelligibility differences between native and nonnative speakers can be increased. Rogers, Dalby, & Nishi (2004) found that even mildly accented non-native speakers of Englis h, who were nearly as intelligible as native speakers in quiet, were substantially less in telligible than native English speakers in noise. Research is needed to understand the conditions in which non-native speakers may have particular difficulty being understood and for the development of effective treatment techniques for non-native speakers of English. During second language acquisition, L2 learners strive for native-like pronunciation. Spanish learners of Englis h must learn a r ange of speech cues and their relative importance to achiev e native-like performance. For several decades, researchers have examined the over all degree of foreign accent in L2 (cf. Flege, 1995). Due to the many fact ors that may influence the degree of L2 foreign accent, numerous studies have been completed in an effort to identify the most important predictors of foreign accentedness. Linguistic experience, including age and duration of immersion in an environment where the L2 is spoken, is a variable that has emerged as a major area of research in the field
3 (Bohn & Flege, 1997; Flege, 1995; Flege, B ohn, & Jang, 1997; MacKay & Flege, 2004; Piske, MacKay, & Flege, 2001). Accuracy of production of target phonem es and their acoustic correlates is another area that has been extensively investig ated in terms of its relationship to degree of foreign accentedness. For vowe ls, the most frequent ly investigated acoustic variables in studies of L2 s peech have been vowel duration and target formant frequencies, typically measur ed as formant frequencies at the vowel midpoint (Bohn & Flege, 1997; Flege, B ohn, & Jang, 1997). However, recent studies of vowels produced by native speakers of English have begun to focus on dynamic properties of vowels, defined as the degree and direction of change in formant frequencies during vowel pr oduction (Hillenbrand, Getty, Clark, & Wheeler, 1995; Hillenbrand & Neary, 1999). Hillenbrand et al. (1995) found that even the Â“monophthongalÂ” vowels of Amer ican English showed characteristic differences in the direction and degr ee of change in formant frequencies measured from 20% to 80% of the vowel duration. In a follow-up study, Hillenbrand & Nearey (1999) found that vowels were about 14.7% more intelligible, on average, when this dynamic information was retained than when it was not. Very few studies of L2 vowel production have examined the dynamic properties of vowels and how these pr operties may contribute to accentedness and intelligibility of secondlanguage learners. In one of the few studies relating vowel formant dynamics to accentedness or intelligibility of L2 speech, however, Kewley-Port, Akahane-Yamada & Aikawa ( 1996) found that the appropriate use
4 of spectral change in vowel production great ly contributes to the intelligibility of vowels produced by Japanese-accented English speakers. Thus, further research examining the acquisition of dy namic properties of vowels is important to understanding the acquisition of native-lik e proficiency in vowel production by L2 learners. Another area that has received relati vely little attention in studies of second-language speech production is the degree to which non-native speakers can change speaking style to adapt to c hallenging speaking environments. There is, however, some literature on the abili ty of both native and non-native speakers to modify their speaking style in response to speaking environment and the effects of these modifications on inte lligibility (Bradlow & Alexander, 2007; Bradlow & Bent, 2002; Ferguson, 2004; Ferguson & Kewley-Port, 2002; Johnson, Flemming, & Wright, 1993; Pi cheny, Durlach, & Braida, 1985a; Picheny, Durlach, & Braida, 1985b). Clear speech is a speaking style t hat is often used to increase the effectiveness of communication. It is ty pically used when speaking with those who are hearing impaired or in other situations when a listener may have trouble understanding (Picheny, Durlach, & Brai da, 1985a). Researchers have found that the use of clear speech by native speaker s positively affects intelligibility for native listeners (Bradlow & Alexander, 2007; Bradlow & Bent, 2002; Ferguson, 2004; Ferguson & Kewley-Port, 2002; J ohnson, Flemming, & Wright, 1993; Picheny, Durlach, & Braida, 1985a; Pic heny, Durlach, & Braida, 1985b). For sentences presented in noise to norma l-hearing native listeners, clear speech
5 has been shown to be about 10-17% more intelligible than normally produced or Â“conversationalÂ” speech (Bradlow & Al exander, 2007; Bradlow & Bent, 2002; Krause & Braida 2002). This increase in intelligi bility is typically referred to as the Â“clear speech benefit.Â” For identification of vowels presented in noise to normalhearing native listeners, a clear speech benefit of about 8% has been found (Ferguson, 2004). Bradlow and colleagues (Bradlow & Bent, 2002; Bradlow & Alexander, 2007) have compared the intelligibility of clear speech produced by native English speakers for native English-speak ing listeners to its intelligibility for listeners for whom English is a second language. They have found a significantly smaller clear speech benefit for the non-native listeners than for the native listeners. They attribute the decreased clear speech benefit for the non-native listeners to an incomplete linguistic kno wledge of the cues enhanced in the clear speech context. If this hypothesis is true, the same incomplete linguistic knowledge may contribute to a reduction for non-native speakers in the acoustic enhancements that occur in clear speech, relative to native speakers. Thus, comparing the acoustic characteristics of phonemes produced in conversational and clear speech styles by native and non-native s peakers may be a useful way of examining productive linguistic knowle dge in these populati ons. Understanding these differences may then result in improved methods of accent reduction training for non-native speakers. No research, however, has been found comparing the acoustic properties of clear and conversational speech produced
6 by native English speakers to the properties of t he clear and conversational speech produced by non-native speakers of English. Thus, the purpose of the present study is to compar e the acoustic characteristics of vowels spoken by native and non-native (Spanish-English bilin gual) speakers of American English in both conversational and clear speech styles. To develop the methodology for the pr esent study, a number of factors had to be considered in detail. The remai nder of this chapter will therefore be used to review in more depth the follo wing topics: theory and research on the role of linguistic experience in second-language speech production; acoustic characteristics of American English vowe ls and research on vowels produced by L2 learners; and previous research on ac oustic and perceptual characteristics of clear speech. Linguistic Experience The speech learning model. The speech learning model (SLM) developed by Flege (1995) attempts to explain the way age and the primary language (L1) phonological system affect oneÂ’s ability to achieve native-like performance in pronunciation and perception of L2 phonemes. T he modelÂ’s premise is that when learning our L1, we perceive the phonetic differences between sounds and create separate phonetic categories for all of the sounds of our L1, including separate categories for at least some of the allophonic variants of phonemes (Flege, 1995). When learning the L2, howev er, the model asserts that learners may either fall short of perceiving t he differences between pairs of speech sounds within the L2, or may fail to per ceive differences between certain L2 and
7 L1 speech sounds (Flege, 1995). The model further hypothesizes that the L2 learnerÂ’s failure to discriminate betw een certain L2 and L1 sounds may be due to assimilation of the L2 sounds to familia r L1 phonetic categories ant that the L1 phonology may filter out feat ures of L2 sounds that are not distinctive in the L1 (Flege, 1995). Another import ant feature of the SLM is the proposal that the L1 phonemes become stronger Â“attractorsÂ” of L2 phonemes as age of onset of learning a second language increases (Flege, Schirru & MacKay, 2003). The SLM also makes predictions a bout changes in categorization of L2 sounds over time. During the early stages of L2 acquisition, the model asserts that some L2 sounds will be identified by the learner as being the same as an L1 phoneme or one of its a llophones, while other L2 phonemes may fall into uncommitted space or may not be identifiabl e as any L1 phoneme. Over time, however, the model predicts that the L2 learner become more able to notice more of the differences between at least some of the L1 and L2 sounds. At this point, the learner may develop a new s ound category, or as Flege terms it phonetic category, to represent differing L1/L2 sounds. The SLM reflects the idea that if an L2 sound is perceptually linked to an L1 sound, production of the L1 and L2 vers ions may eventually merge (Flege, 1995). According to the model, the likelihood that L1 and L2 phonemes will merge is influenced by the age of onset of learning (AOL) of the L2, and the distance between L1 and L2 sounds as perce ived by the learner. The likelihood that an L2 sound will be placed into a new phonetic category increases with an increase in perceived distance between the L1 and L2 sounds by the learner.
8 Similarly, the earlier the AOL, the sma ller the distance between sounds needs to be in order for the learner to categoriz e the L2 sound as different from the L1 sound. Vowel inventories of Spanish and E nglish and predictions of the SLM. According to most sources, Englis h is assumed to have approximately 12 Â“monophthongalÂ” vowels (/ i, ,, ,e,o, , u /) (Ladefoged, 1982), while Spanish has five (/ i,e, ,o,u /) (Dalbor, 1969). Thus, Spanish learners of English must adapt their acoustic vowel space to include the new English vowels. Although some English vowels have a phonemic counterpart in Spanish, namely /i,e, ,o,u/ others do not. According to Bradl ow (1995), the vowel spaces of English and Spanish differ in several ways. She states that although some vowel categories occupy similar positions in t he acoustic space of English and Spanish, they are not precisely in the same posit ion. So in addition to Spanish speakers needing to find a position in their articulatory vowel space for about seven new vowels in order to have native-like vowel pr oduction, they also must fine tune the production of similar vowels in English. Spanish vowels are also assumed to be produced with little or no spectral c hange as compared to English vowels, although this issue has not been extensivel y investigated for Spanish (Flege, 1991). For a Spani sh learner of English, a predi ction of the SLM is that the difference in size of the vowel inventorie s of Spanish and Eng lish might result in a large number of English vowels being as similated to Spanish vowel categories,
9 especially by later learners. Thus, a native Spanish (NS) speaker who began learning English at an early age should be more likely to differentiate between all English vowels than a native Spanish speak er who began learning English later. Flege (1995) suggests an earlier learnerÂ’s production of target L2 vowels should exhibit greater accuracy, but sugges ts that may be deflected from target positions for native talkers due to the need to maintain phonetic distance between similar L1 and L2 sounds. The pres ent study will help to address these hypotheses by examining vowel productions of earlier and later Spanish learners of English and by comparing dynamic feat ures of these vowels. Because formant dynamic properties have not been extensiv ely investigated, they should offer a unique means of providin g supporting (or disconfirming) evidence for the predictions of the SLM. Acoustic Properties of Vowe ls and Studies of Vowels Produced by L2 Learners Acoustic properties of vowels. In the classic study by Peterson & Barney (1952), the authors conducted an experiment that addr essed target formant frequencies (F1F3), vowel spaces and variation across vowels produced by men, women and children. Formant frequencies, formant amplitudes, and fundamental frequency (F0) were measured at a single time slice. The authors found that formant frequencies were highly va riable for each speaker. In addition, there was a considerable degree of ov erlap between vowel formant frequencies for vowels of different categories. In par ticular, considerable overlapping existed between / / and / / / / and / /, /u/ and / /, and / / and / /. The F3 values were
10 largely variable between all three groups of talkers. The men had the lowest, the womenÂ’s were intermediate, and the children had the highest frequencies. In Hillenbrand et al. (1995), the aut hors attempted to replicate and to address the limitations of the study of vowel acoustics by Peterson & Barney (1952) (PB). The limitations included: (1) measurements were taken at a single time slice; (2) duration measurements were not made; (3) measurements of spectral change over time were not made; (4) speaker and listener dialect was not considered; (5) data on age and gender of child talkers were not provided; (6) the child group was small; (7) identifiability of tokens c ould not be determined; (8) reliability of measurements was not r eported; and (9) the database is no longer available and cannot be used to make F0 and formant frequency comparisons. The authors extended the PB study to include measures of vowel duration and spectral change information by native speakers of English. To measure spectral change, vowel form ant measurements were made at 20%, 50%, and 80% of the vowel duration as measured fr om onset of voicing for the vowel to onset of closure for the stop for the /hVd/ words recorded. The authors also attempted to replicate the Â“targetÂ” vowe l measurements of PB by making formant measurements at the location within t he vowel judged to have the least amount of change in the first and second formants ( F1 and F2 respectively). Hillenbrand et al. (1995) also conver ted formant frequencies from Hz to mels for analysis of spectral change in or der to present the data in a way that would be better correlated with listenersÂ’ perceptions. The vowels with the longest durations were / /, //, and /e/, and the vowels with the shortest
11 durations were / /, / /, / / and / /. Durations of vowels produced by male speakers were shorter than those for vowels produced by women and children. The vowels with the greatest degree of spectral change were / /, //, / / and / /, and the vowels with the smallest degree of spectral change were /u/, /i/, and / /. The vowels that are in close prox imity to each other vary by the changes in F1 and F2 and durational di fferences. For example, although the vowels / / and / / are located close together at 80% of vowel duration, both F1 and F2 are substantially higher for / / than for / / at 20% of the vowel duration. Average formant values for the three ta lker groups from Hillenbrand et al. (1995) reflected a general tendency to ward crowding among adjacent vowel categories as compared to the PB data. The only vowels that did not occupy similar relative positions in Hillenbr and et al. (1995) and in the PB data were / / and / /, with higher F2 values for // than / / and lower F1 values for / / than / / in Hillenbrand et al. (1995) than in PB. In a follow-up study, Hillenbrand and N earey (1999) showed that the vowelsÂ’ formant dynamic properties are used by listeners for vo wel identification. Hillenbrand and Nearey (1999) created two se ts of synthetic versions of /hVd/ words modeled on the properties of vo wels produced by the talkers in Hillenbrand et al. (1995). In one set of synthetic stimuli (dynamic vowels), they preserved the direction and degree of fo rmant change observed in the natural
12 vowels and in another set (static vowe ls) they maintained a single Â“targetÂ” formant frequency throughout the vowels. T he synthetic vowels were played to listeners who had to decide which word they had heard. The dynamic vowels were about 14.7% more accurately identif ied by the listeners than the static vowels. The vowels that were most af fected by the additi on of the dynamic information were / e /, / /, / /, / /, and / o /. Conversely, / /, / /, and / / were least affected by the addition of the dynamic information. Vowels produced by L2 learners. Spectral change is an important cue for vowel identification by native listener s (Strange, Jenkins, & Johnson, 1983; Hillenbrand & Nearey, 1999). Since native listeners rely on spectral change, further research that spec ifically addresses the use of spectral change in vowel production by non-native speakers is needed. Little research has addressed the use of formant dynamic cues by Spanish speakers of English in vowel pr oduction. Appropriate use of spectral change in vowel production has been show n to contribute to non-native speech intelligibility for Japanese-accented Eng lish speakers, however (Kewley-Port, Akahane-Yamada & Aikawa, 1996). In Kewley -Port et al. (1996), the aim of the authors was to gain knowledge of the perception and produc tion of American English (AE) vowels by Japanese talker s. Three experiments were conducted including open-set identification, mini mal-pair identification, and acoustic correlation of perception and production. T he major finding in this experiment was that spectrally similar AE vowels produced by native speakers of Japanese were less intelligible to native English sp eakers than were dissimilar vowels. The
13 authors concluded that the Japanese ta lkers were unable to effectively communicate all of the spectral properti es of the target AE vowels. The authors used regression analysis to study the in fluence of three acoustic properties (target frequency, dynamic formant move ment and duration) of vowels produced by Japanese-accented English speakers on the intelligibility of / / and / / for native English-speaking listeners. They found that spectral change of Japanese English vowels relative to the AE ta rgets was the most important property influencing intelligibility of these tw o vowels. Although duration was found to be significant for / /, it was not independently responsible for increased intelligibility. Bohn & Flege (1997) found that adult experienced German learners of productions of a vowel category that is not present in German were perceived as native-like by native Eng lish-speaking listeners. The authors recorded the production of / / by three groups: monolingual English speakers, experienced German learners of English, and inexper ienced German learners of English. The general distribution of the vowe ls in the Bark-difference space revealed that the inexperienced German su bjectsÂ’ German vowels did not occupy the same space as the English subjectsÂ’ / /. The authors concl uded that this is sufficient evidence to support the premise that the English / / is a new vowel for their German subjects. Next, the three groups each reco rded productions of the words bat and bet in the carrier phrase I will say ___. The fundamental and formant frequency
14 measurements (F1, F2, and F3) and duration of the vowe ls were examined for all three groups. With regard to formant fr equency, the authors concluded that both monolinguals and experienced subjects pr oduced fairly clear distinctions between the two vowels; however, inexper ienced subjectsÂ’ productions revealed an almost complete overlap of the target vowels / / and / /. With regard to vowel duration, the authors found that both the monolingual and experienced bilingual groups had similar durational ratios for the two vowels. Conversely, the inexperienced group had smaller ratios for the two vowels. The authors concluded that the results support the hy pothesis that experienced adult learners will accurately produce a new vowel, but inexperienced adult learners will not. In an effort to determine whether the perception of / / related to the aforementioned findings, the authors c onducted another experiment. Synthetic speech was created that m anipulated duration and form ant frequency values to simulate the target / / to the target / /. Intermediate formant values between those appropriate for American English / / and / / were used to create a continuum of synthetic vowe l stimuli between the end vowe ls. Each of the eleven synthetic stimuli created was presented at durations of 150, 200, and 250 ms. The same subjects as were recorded fo r the acoustic analysis listened to the stimuli and identified them as either bet or bat. From the results of the perception experiment, the authors concluded that the monolinguals relied most on s pectral differences to identify bet versus bat followed by the experienced group, with the inexperienc ed group relying least on
15 spectral differences. Conversely, t he inexperienced group relied most on duration, followed by the experienced gr oup, with the monolinguals relying least on duration. Bohn & Flege (1997) theorize that, contra ry to the predictions of the SLM, experience may influence production more than perception for the / / // contrast for German learners of Eng lish because the experienced GermansÂ’ productions appeared to be more nativ e-like than their perception. One explanation for this diff erence may be related to the feedback that immersed learners receive for this notoriously difficu lt vowel contrast. That is, immersed L2 learners gain more feedback on their produc tion versus their perception in their second language. Conversely, L2 classroom learners would tend to have more feedback given to them on their per ception of the new language. Flege, Bohn & Jang (1997) studied vo wel production by Spanish speakers of English; however, their study did not include spectral change information. The acoustic analysis in their study was limited to the midpoint of the vowel. Their aim was to explore the effect of L2 experienc e on non-native speakersÂ’ production of the English vowels /i, / as judged by native English listeners. The speakers included twenty each of German, Spanish, Mandarin, and Korean subjects, and 10 native speakers of E nglish. Native speakers of English evaluated the intelligibility of the nat ivesÂ’ and non-nativesÂ’ productions of the English vowels /i, , / in bVt context, within the carrier phrase I will say. The native English speakers were given seven choices by which to identify each
16 production (Â“ beat, bit, bet, bat, bait, butÂ” and Â“bottleÂ”) An intelligibility score of percent correct identification by the nat ive listeners was obtained for each native and non-native talker. Although the main effect of experienc e on intelligibility was not found to be significant, the interaction between experience and vowel was found to be significant. The Spanish talkersÂ’ productions of / / yielded a higher percentage of correct vowel identifications by native English listeners than did their productions of /i/ and / /. Spanish talkersÂ’ intended / / productions were often heard as / /. The authors concluded that t he Spanish talkers were producing a vowel for target // that was more poste rior in vowel space than American English / /. Conversely, the Spanish ta lkersÂ’ productions of / / were almost always correctly identified by the native Eng lish listeners. The authors c oncluded that this is due to an allophone of Spanish /e/ being dire ctly transferred into English. The authors found evidence that unde rmines the Contrastive Analysis Hypothesis (Lado, 1957, as cited by Flege et al., 1997). According to the authors, the theory by Lado suggests that the abs ence of a vowel fr om the L1 phonemic inventory may represent a source of lear ning difficulty. This theory is not supported by the authorsÂ’ finding for Spanish learners of English. They found that Spanish subjectsÂ’ int ended productions of / / (a phoneme not found in Spanish) were more often correctly identified than their intended produc tions of /i/ (a phoneme found in Spanish) and / / (a phoneme not found in Spanish).
17 Clear versus Conversational Speech Clear speech is often used to increase the effectiveness of communication. It is typi cally used when speaking with those who are hearing impaired or in environments in which co mmunication may be difficult (such as noise or reverberation). Researchers have found that clear speech positively affects intelligibility (Bradlow & Be nt, 2002; Ferguson & Kewley-Port, 2002; Johnson, Flemming, & Wright, 1993; Picheny, Durlach, & Braida, 1985 a,b ). Many acoustic differences between phonemes ar e enhanced in clear speech produced by native talkers. The speech cues us ed by Spanish bilinguals during clear speech may give more understanding as to which cues these bilinguals think are important for distinguishing target phonemes. Native speakersÂ’ clear speech is more intelligible than normal or Â“conversationalÂ” speech. Pichen y, Durlach, & Braida (1985 a ) found that clear speech is 17% more intelligible than conv ersational speech for hard of hearing listeners. Fifty clear and conversational nonsense sentences were presented in quiet to five listeners with stable sensor ineural hearing losses at three levels: most-comfortable-level, maximum list ening level, and 10 dB below mostcomfortable-level. In addition, each listener adjusted the listening level in four different frequency configurations to the highest level comfortable for long-term listening. Johnson, Flemming, & Wright (1993) reported larger vowel spaces in hyperarticulated (clear) speech versus c onversational speech of native speakers.
18 Therefore, it can be theorized that the cues used by native speakers in clear speech production are the same cues that are important for perception. Vowel spaces of Spanish-speaking bilinguals using clear vers us conversational speech have not been studied thus far. Phonetic knowledge of cues is needed in order to produce native-like speech. Ferguson & Kewley-Port (2002) examined formant frequency measures, degree of spectral change, and duration for target vowels produced in conversational and clear speech style by a single native speaker of American English. In order to assess acoustic diffe rences between the two speaking styles, the authors used several metrics, includi ng target formant values, vowel duration and a vector length measure of spec tral change during vowel production. Formant frequency measures in Hertz we re converted to the Bark scale (Traunmller, 1990). The Bark scale was used because equal Bark differences are perceptually equal at different portions of th e scale, while equal Hertz differences are not. Clear speech tokens typically had hi gher F1 values, but values for F2 frequencies varied among vowels. In clear speech, F2 was higher for front vowels versus back vowels. In general, the vowel space occupied in clear speech was found to be larger than the vowel space occupied in conversational speech. In addition, due to the overall increase in F1 values, the space was shifted to occupy the higher values of F1
19 When measuring duration, the authors found that the av erage duration of clear speech tokens was approximately t wice that of conversational speech tokens. All ten vowels showed a significant positive effect for duration. Dynamic formant movement was also studied for both speech styles. Vector length was used to meas ure the distances between F1 and F2 values at 20% and 80% of the vowel duration. The ve ctor was computed by calculating the Euclidean distance (in Barks) between the F1 and F2 values at 20% and 80% of the vowel duration. Vector length in the more crowded areas of the talkerÂ’s vowel space was found to be significantly greater in clear speech. In the perception portion of their study, Ferguson and Kewley-Port (2002) found that young normal hear ing (YNH) listeners der ived a 15% benefit in intelligibility from the clear speech, compared to the conversational speech; however, elderly hearing impai red (EHI) listeners did not benefit from the clear speech in this study. Both YNH and EH I listeners were presented with a vowel identification task where each word wa s mixed with a segment of speaker babble. For the YNH listeners, words were presented at an overall level of 70 dB SPL with a speech-to-babble ( S/B) ratio of -10 dB. The EHI listenersÂ’ S/B ratio was -3 dB. The listeners identified t he vowel within each word by typing the vowelÂ’s corresponding number on a key board. It was of interest, however, that alt hough the EHI listeners did not benefit from clear speech for vowel identification, they did surpass the YHN listenersÂ’ percentage correct vowel identification for the conversational speech tokens.
20 This may have been due to the less difficult S/B ratio presented to the EHI listeners. Bradlow & Bent (2002) studied the cl ear speech benefit derived by native versus non-native speakers of English. The subjects included 32 non-native listeners of English and 72 nat ive listeners of English. Si xty-four simple English sentences containing three or four key words were recorded by two adult native English speakers, one male and one fema le. All sentences were produced in conversational and clear speaking styles. The non-native listeners complet ed a perception and a production task. The sentences were presented in white noise (first Â–4, then Â–8 dB signal-to-noise ratio) and in both speaking styles. Through headphones, t he subjects heard either a male or female talker and were told to write down whatever they heard. On a separate day, a word-familiarity rati ng test was given. Keywords from the sentences were presented on a computer screen with other dist ractor words, and the subject rated his or her familiarity wit h that word. Each subject then read the same sentences from the perception ta sk. The authors edited these sentences by adding noise at a +5 dB signal-to-noise ratio (SNR), similar to the sentences used for the perception task. Thirty-two native liste ners participated in a sentence-in-noise perception task, and 40 additional subjects ju dged the non-nativesÂ’ sentence production stimuli. The 32 listenersÂ’ perception task mirrored that of the non-natives. The 40 judges of the non-nativesÂ’ production list ened and transcribed what they heard. Intelligibility estimates were based on the perception of the judges.
21 One major finding of this study wa s that a smaller clear speech benefit was found for the non-native listener gr oup than for the native listener group. In other words, the non-native listener s did not benefit as much from clear speech as did the native listeners. T he average clear speech benefit for nonnatives was about 5% versus the much larger average benefit of about 16% for the native listeners. The author s asserted that the finding for the native listeners was similar to those of previous studi es that examined hear ing impaired adults versus normal hearing (Schum 1996; Picheny et al., 1985 a ; Helfer, 1997). In these three studies, the range of the cl ear speech effect for hearing impaired listeners and normal listeners with degr aded signals is 16 to 20%. In a companion study to Bradlow and Bent (2002), Bradlow, Kraus, & Hayes (2003) found that the average clear speech benef it for learning impaired children and non-learning impaired childr en was the same (about 9% somewhat lower than that found for adults). Bradlow & Alexander (2007) found t hat the non-native listener average clear speech benefit was smaller than t he average native listener clear speech benefit. In this study, both native and non-native listeners heard English sentences in plain (conversational) and clear speech that differed in the final word. The clear and conversational sentences were further subdivided into high and low context. The subjects were pres ented with sentences in noise and were to write the final word on an answer sheet. The authors hypothesized that nonnative listener speech-in-noise percepti on would be improved by both semantic (high context) and acoustic-phonetic (clear speech) enhancements.
22 Bradlow & Alexander (2007) addre ssed the limitation of uncontrolled target word predictability in Bradlow & B ent (2002). By doing so, they isolated the effect of clear speech from higher-level semantic-contextual information. From the results, they conclude that non-nat ive listeners do gain a significant benefit from clear speech independe nt from a decreased ability to use semanticcontextual information. The authors further suggest that list eners with less exposure to their L1 (i.e., children and non-natives) will eventua lly develop a greater degree of the clear speech effect with increased expos ure to the language in question. The authors maintain that native listener s utilize the language-specific, code enhancements of clear speech, but that non-natives utilize mainly the signal enhancements of clear speech. In other words, native listeners use the exaggerated acoustic distance between c ontrasting categories (less vowel reduction), increased durati on, and the pronunciation norms typically heard in clear speech. Non-natives, they assert, use the overall acous tic improvement of the signal, such as a slower speaking rate, a wider dynamic pitch range and more precise stop consonant re leases (Picheny et al., 1985 b ). The authorsÂ’ final remarks (Bradlow & Bent, 2002) include an admission of the need for a better understanding of how talkerand listener-related factors interact to influence overall speech intell igibility. This supports the need for further research in the area of acousti c analysis of bilingual clear speech production. Purpose of the Present Study
23 The purpose of this study is to examine the effects of linguistic experience on the acoustic properties of six ta rget vowels produced in clear and conversational speech styles. Three talk er groups were recr uited: monolingual native English speakers, early (relativel y balanced or English dominant) SpanishEnglish bilinguals and late (primar ily Spanish dominant) Spanish-English bilinguals. The acoustic variables analyzed include vowel duration, fundamental frequency and formant frequencies at vowel midpoint (50% of vowel duration), and extent of change in formant frequencie s across the target vowel duration (from 20% to 80% of vowel duration). The present study tests the hypothesis that later and/or early learners of English as a second language may exhibit an exaggerated or restricted degree of change in their production performance between clear and conversational speech styles for certain acoustic cues. On at least some features, the productions of early learners were expec ted to be similar to those of native speakers. The productions of late learner s of English were expected to differ more from those of monolinguals, and ce rtain target vowel pairs (e.g., /i/-/ I / were expected to overlap substantially in t heir production, especially for the late learners). The present study differs from previous studies of second language vowel production in that it examines t he spectral change of L2 vowels versus vowels produced by native English speak ers and examines non-native speakersÂ’ ability to modify acoustic properties of vowels when asked to change speaking style.
24 Chapter Two Method Inclusion and Exclusion Criteria Monolinguals who participated included adults up to age 60 who were native speakers of English. They were required to have no history of speech or hearing impairment or a str ong regional accent. Persons who rated themselves as fluent in a second language, or whos e parents/caregivers used another language with them as a child were not incl uded. It was preferr ed that talkers be born and raised in the Tampa Bay area, but ot her subjects not fitti ng this criterion were allowed. Bilinguals who participat ed included adults up to age 60 who were native speakers of any New World variety of Spanish (Caribbean, South American, Central American, or Mexica n). They were required to have no history of speech or hearing impairment, nor to speak any languages other than Spanish and English. The Spanish talkers were further divided into two groups consisting of ten late bilinguals and 15 early bilinguals, based on their age of onset of immersion in an English-speaking env ironment (AOI). The experienced early bilinguals' English AOI was age 12 or under. Furthermore, this group rated themselves as English dominant or balanced in at least two modalities (listening, speaking, reading and writing), one of which was required to be non-print (i.e.,
25 must be listening or speaking). The less experienced late bilingualsÂ’ English AOI was age 15 or later. Participants were recruited through flyers placed around the university campus. All participants were prescreened over the phone for inclusion criteria. Each participant was paid $20 upon co mpletion of the one-hour recording session, which was preceded by a onehour session of perceptual testing (associated with a related expe riment) on a preceding day. Participants The participants included in the result s comprised three groups of talkers: 1) ten native English speaker s (monolinguals MO); 2) 15 early Spanish-English bilinguals (EB); and 3) ten late Spanish -English bilinguals (LB). Males and females were recruited equally, however, more females than males volunteered for all three groups, so that less than one fourth of any group was represented by males. The male participants were therefor e dropped from the study due to their representation of a low proportion all three groups. A gender effect on degree of intelligibility difference between clear and conversati onal speech was found by Ferguson and Kewley-Port (2004). With the small proportion of males, gender effects could not easily be analyzed and thei r effects on the data would therefore be unknown. Other female participants who did not fit the criteria were allowed to participate, but were later dropped after det ailed reading of t heir questionnaires. Of the total participants recruited, data fo r ten of 24 monolinguals, 15 of 33 early bilinguals, and 10 of 21 late bilinguals we re included for analysis in the present
26 study. Some participants were dropped from acoustic analysis because their voice quality caused automatic form ant tracking to be unreliable. Table 1. Demographic data for ear ly bilingual talkers. Data are displayed for gender; age; country of origin (o f listener or listenerÂ’s parent s if born in the U.S.); age of onset of immersion in an Englis h-speaking environment (AOI); number of years spent living in the U.S.; and self-ratings of language dominance (E=English; S=Spanish; B=balanced) for t he skills of listening, speaking, reading and writing. Language background information Language most comfortable for: Code Age Born/ Raised in US? Country AOI SpeakListen Read Write EB05 19 Y Cuba 4.5 E E E E EB06 19 N Mexico 5 B B E E EB08 19 N Nicaragua 8 E E E E EB10 19 Y Nicaragua 6 B B B B EB11 20 Y Cuba 6 E E E E EB12 24 N Puerto Rico 10 E E E E EB16 19 Y Mexico 6 S E E E EB17 19 Y Cuba 4 E E E E EB19 18 Y Cuba 4 E E E E EB24 26 Y Colombia 5 E E E E EB25 21 N Colombia 11 E E E E EB26 26 N Venezuela 12 B B E E EB29 19 Y Cuba 2 B B E E EB30 19 N Venezuela 8 B B B E EB33 22 N Colombia 6 S E E S Avg./ Sum. 20.6 8 Y; 7 N 3 Colom.; 2 Venez.; 5 Cuba; 2 Mexico; 3 Other 6.5 8 E; 5 B; 2 S 10 E; 5 B; 0 S 13 E; 2 B; 0 S 13 E; 1 B; 1 S
27 Table 2. Demographic data for late bilingual talkers. Data are displayed for gender; age; country of origin (o f listener or listenerÂ’s parent s if born in the U.S.); age of onset of immersion in an Englis h-speaking environment (AOI); number of years spent living in the U.S.; and self-ratings of language dominance (E=English; S=Spanish; B=balanced) for t he skills of listening, speaking, reading and writing. Language background information Language most comfortable for: Code Age Born/ Raised in US? Country AOI SpeakListen Read Write LB01 30 N Panama 21 E S B B LB06 19 N Colombia 16 S S S S LB07 50 N Colombia 45 S S S S LB10 28 N Colombia 28 S S S S LB11 22 N Colombia 22 S S S S LB13 19 N Puerto Rico 16 S S S S LB15 22 N Colombia 18 S S S S LB16 49 N Colombia 46 S S S S LB19 22 N Cuba 19 S S E E LB21 21 N Colombia 18 S S S S Avg./ Sum 29.6 10 N 11 Colom.; 1 Cuba; 3 Other 24.9 13 S; 1 E 14 S 11 S; 2 E; 1 B 12 S; 1 E; 1 B Materials Six target vowels / i, e, / and / /, embedded in /bVd/ context, were used as stimuli for the experiment. The tar get words were written as Â“bead, bid, bayed, bed, badÂ” and Â“bodÂ” and were embedded in the carrier phrase Â“Say _______ again.Â” Digitization and recording equipment included an Audio-Technica: AT4033 condenser microphone, an Applied Research and Technology microphone preamplifier with 48V phantom power s upply, a Roland VS890 Digital Studio Workstation recorder, and S ennheiser HD265 headphones.
28 Editing software used included a signa l editing software program (CoolEdit 2000, 2000) and Praat speech analysis software (Boersma & Weenink, 2006). The digitization/recording equipment was configured with the microphone connected to the input channel of the Applied Research and Technology microphone preamplifier. T he preamplifier was connected to an analog input channel of the Roland VS890 Digital Studio Workstation. Recordings were digitized at 44.1 kHz with 24 bit re solution on AD conversion Â– 64 times oversampling. An antialiasing filter (20 kHz) was used and filtering automatically performed by the workstati on; the effective response range was 20 Hz Â– 20 kHz. The written stimuli were presented to talkers on a 15 inch flat screen monitor located inside the recording booth. The CPU of the computer was located outside of t he recording booth. Following recording, the experimenter transferred the files from the digital workstation to a PC. The files were tr ansferred digitally using coaxial cable connected from the digital out put of the workstation to the digital input of an MAudio Audiophile 2496 sound card insta lled on the computer. Each recording session was transferred digitally, with s eparate files for conversational and clear speech stimuli. Recording Procedure Three experimenters conducted the recordin g of stimuli by the talkers. All were trained and judged by a trained lin guist (the major professor) to be consistent in procedural manner.
29 An informed consent document, a race-ethnicity form and a language background questionnaire were filled out by every participant recruited. Each talker was recorded in a single-wall s ound attenuating booth (IAC). Recording equipment (other than mi crophone) was located outside the booth. The microphone was positioned appr oximately six inches fr om talkerÂ’s mouth and located at a 45 degree angle fr om the talkerÂ’s mouth. Recording levels were monitored and adjusted as needed by the ex perimenters to avoid peak clipping and to maintain sufficiently high input amplitude. There were two different speec h styles (conversational and clear) produced by each talker. The experimenter showed the stimulus words to the talkers and read them aloud to the talker in order to avoid orthographic errors. Distractor words were incl uded in the conversational style reading list to keep talkers from focusing too much on the /bVd / frame of the target words. Distractor words were all single syllable /CVC/ (but not /bVd/) words (e .g., Â“cut, capeÂ”). Target and distractor words were intermi xed for the conversa tional condition. For the clear speech condition, only the target words were used. Each word (embedded in carrier p hase Â– e.g., Â“Say bad againÂ”) was presented using a Microsoft PowerPoint presentation file. A separate monitor and keyboard with dual control were located outside the recording booth. When the subject finished saying the sentence, the experimenter cli cked on the screen (or pressed the right arrow key) to present the next sentence. Twelve practice trials (one for each target and each distractor word) were conducted. On each practice trial, the subject heard the sent ence to be read over
30 headphones and saw the text display ed on the screen. The subject was instructed to repeat the sentence in a no rmal speaking style. Audio of the 12 sentences to be repeated were produced by a single male talker (a monolingual native English speaker), recorded usi ng the same procedures and equipment described above. These recorded stimuli we re transferred to the computer in same way as described above. Each tar get phrase was saved to a separate file for presentation during the practice trials. During the conversational style trials, the subject was instructed to remove the headphones used for the practice trials. The text of each target sentence was presented on the screen and the subject was instructed to read each sentence aloud in a normal speaking style. Each talker produced seven repetitions of each target and distractor word, for a total of 84 target sentences produced in the conversational style. Four lists of 21 sentences each were read by each talker with an opportunity for a s hort break given between each block of 21 sentences. The 84 target and distractor words were pseudorandomized so that no more than two /bVd/ words occurred in a row. Approx imately half of the /bVd/ target words for each vowel were presented in the first two lists. During the clear style trials, the talk ers were instructed that some of the sentences they had produced needed to be spoken more clearly Â– as if speaking to someone who doesnÂ’t understand. The subjects were not given any particular instructions as to how to produce clear to kens. No distractor words were used for this condition. Each talker produced seven repetitions of each target word, for a total of 42 target sentences. Two lists of 21 sentences each were read by each
31 talker with an opportunity for a short break given between each block of 21 sentences. The 42 words were pseudorandomized so that no target word was occurred two times in a row. The entire re cording session took approximately one hour for each talker, including completi on of consent forms and questionnaires. Editing Procedure Two trained experimenters edited a ll recorded target-word stimuli into isolated words. Each larger file (for session or style) was opened and subsequently edited in CoolEdit 2000. Ea ch list of 21 sentences was isolated from the larger file and saved to a separate file. Each sentence containing a target word in the list of 21 sentences wa s then edited to isolate the target word only. The target word was isolated by firs t locating and selecting the release of the initial /b/, plus 20 ms of the waveform preceding t he /b/ release. The contents of the file preceding this 20 ms buffer were then deleted. The first 10 ms of the 20 ms buffer were then silenced. In cases w here prevoicing of /b/ occurred, the next 3 ms were selected and linear ly ramped from 0 to 100% of the original amplitude to prevent the perception of a click. Thus, the initial /b/ and up to 10 ms of prevoicing were preserved in the isolat ed word files. Next, the release of the word-final /d/ was located and selected on the waveform, plus 20 ms of the waveform following the /d/ release. The cont ents of the file following this 20 ms buffer were then deleted. The last 10 ms of the word-final 20 ms buffer were then silenced. Then the 3 ms of energy precedi ng the last 10 ms were linearly ramped from 100 to 0% of the orig inal amplitude, again to pr event the perception of a
32 click. Thus the release of the word-fi nal /d/ and 10 ms of the energy following were preserved in the isolated word f iles. Finally, the remaining waveform was saved to a new isolated word file. Two of the seven tokens recorded from each talker for each of the target words were selected for analysis in the present study. The first and second tokens produced by each talker were us ed unless there was disfluency or poor voice quality or the talker clearly made an error in reading the word. If a token was not usable, the experimenter ex amined additional repetitions until an acceptable one was found. Prior to acoustic analysis, all isolat ed word files were amplitude equalized for use in a separate experiment. For equa lization, the average RMS of each file was set to -25 dB from the maximum amplitude. To accomplish this, the full duration of the isolated word file (inclu ding the silence of 10 ms of silence on the beginning and end) was selected and then t he fileÂ’s average RMS was computed using an automated procedure (CoolEdit 2000, 2000). The difference from -25 dB was computed and the amp litude adjustment procedure in CoolEdit was used to adjust amplitude up or down by the des ired number of dB to get the average RMS of the file to equal -25. After amplitude adjustment, equa lization was double checked by again obtaining the average RMS for the entire file and checking that it was equal to -25 dB. Settings for Acoustic Analysis All time and frequency measurement s described below were made using Praat (Boersma & Weenink, 2006). The follo wing settings were used, except in
33 cases where formant tracking did not pr ovide a good match to observed formants on the wide-band spectrogram (see below): window length for spectrogram = 5 ms (wide-band spectrogram); spec trogram display range = 0-5500 Hz; spectrogram display dynamic range = 50 dB (Praat default); pre-emphasis for spectrogram display = 6 dB/octave (Praat default); method fo r automatic tracking of F0 = autocorrelation; range for F0 tracking = 75-500 Hz; method for formant tracking = Burg; pre-emphasis starting frequency for formant tracking = 50 Hz; number of formants to be tr acked within 0-5500 Hz = 4, 5, or 6, depending on the experimenterÂ’s judgment based on visual inspection of the agreement between formant tracks and formants observed on the wide-band spectrogram; window length for formant tracks = 20 ms. Vowel Duration Measurement Measurement of vowel durati on was performed by two trained experimenters (the author and a trained as sistant). Agreement was checked and any additional measurement needed was pe rformed by a trained linguist (the major professor). Criteria for determini ng vowel duration were specific. For the beginning of the vowel (vowel onset), ex perimenters located on the waveform the first large positive amplitude peak follo wing the maximum negative of the first periodic cycle that had the same pattern as the rest of the vowe l (i.e., not part of pre-voicing). The onset of F2 on the wide-band spectrogram was also used to confirm the location of the vowel onset. T he first pulse where F2 was visible was a landmark for vowel onset. Typically, the waveform and spectrogram criteria for vowel onset agreed well; when they did no t, the experimenters selected one of
34 the two criteria using their best judgment to determine the location of the vowel onset. For the end of the vowel (vowel o ffset), the experimenters used the waveform display to locate the peak of t he first negative pulse of the last cycle of voicing that had a similar shape as the rest of the vowel (last cycle prior to closure Â– not included in more sinusoida l cycles occurring during voicing during closure). The offset of F2 on the wi de-band spectrogram was also used to confirm the location of the vowel offset The last pulse where F2 was visible during the vowel was the spectrographic la ndmark for the vowel offset. Typically, the waveform and spectrogram criteria fo r vowel offset agreed well; when they did not, the experimenters selected on of the two criteria using their best judgment to determine the location of the vowel offset. Vowel onset and offset measures for each selected token were copied and saved to a spreadsheet. A spreadsheet formula autom atically computed vowel duration and locations for 20%, 50% and 80% of vowel duration when on set and offset data were entered. When all vowel onset and offset measurements were completed independently by the two student experi menters, the trai ned linguist used a spreadsheet formula to determine agreem ent for the vowel onset and offset taken by the two student experimenters. The agreement criterion was set to 5 ms, which is approximately one pitch per iod for the average female, rounded to the nearest ms. That is, the average funda mental frequency (F0) for females is 219 Hz according to Hillenbrand et al. (1995), which converts to 4.57 ms per pitch period. The criterion for one pitch period for agreement was adapted from
35 Strange, Yamada, Kubo, Trent Nishi & Jenkins (1998). For consistencyÂ’s sake, the time measurements of a single st udent experimenter (the author) were used as the landmarks for frequency measurements for all instances in which the two students agreed. The times of vowel onset and/or vowe l offset were remeasured by the trained linguist for all tokens for which the measures of vowel onset or vowel offset of the two student experimenters disagreed by more than 5 ms. In nearly every case, the measurement of the trained linguist agreed with that of one of the student experimenters. In the few cases where the measurement of the trained linguist did not agree with that of either of the students, the trained linguist rechecked the measurement and recor ded her own measurements in the spreadsheets of both raters. Frequency Measurements Following time agreement measuremen t, fundamental frequency (F0) and the frequencies of the first four formants (F1-F4) were measured at the time points of 20, 50 and 80% of the vowel durat ion. Only measurements for duration, F1 and F2 will be used for the present thes is. As stated above, the time points of a single rater (identity depend ent on agreement) were used to determine points from which to make fo rmant measurements. Frequency measurements were performed by three trained experimenters and the trained linguist. Frequency measurem ents were made by two of these four persons for each token and recor ded to separate spreadsheets; agreement between the data on the two spreadsheets for each token was then computed by
36 the trained linguist. Agreement criteria for F1, F2 and F3 were +/50, 150 and 250 Hz respectively, following Strange et al. (1998). The agreem ent criterion for F4 was the same as for F3 (+/250 Hz). In cases of agreement between the tw o spreadsheets, the measurements from a single spreadsheet (that of t he author) were used. In cases where agreement within the specif ied criteria was not f ound, frequency measurements were made by a third experimenter and va lues for which at least two raters agreed were subsequently used; in the rare cases where all three raters disagreed, the measurem ents of the trained linguist were used. For measurement of F0, automatic measurements were used almost exclusively. In the rare instances w here the pitch tracking appeared to be in error, measurements were made by hand from the waveform by measuring the duration of the target pitch period and converting to Hz. Two measurement techniques were used for measurement of formant frequencies. Automatic formant tracking was used in most cases, but analysis by hand was used in some cases. For autom atic analysis, the automatic formant tracking feature (Formant Show Formants) was used to overlay formant tracks on the wide band spectrogram display. The Praat (Boersma & Weenink, 2006) query feature was then used to automatical ly obtain the locations of F1-F4 and this information was then pasted into the spreadsheet for each token. The number of formants chosen as a setting in the automatic formant tracker was modified based on experimenter estima tion of the best match between the formant tracker setting and the fo rmants observed on the wide-band
37 spectrogram. Any extra formant tracks seen on the display (between formants observed on the wide-band spectrogram) were skipped for the purpose of measurement. The number of formants used for tracking wa s four, five or six for each token; this information was also recorded in the spreadsheet for each token. By hand analysis from a narrow-band s pectral slice was used for tokens that did not yield reliable formant tra cks using the automatic formant tracking feature. This method was adapted from Monsen & Engebretso n (1983). For this procedure, the spectrogram display was converted to a narrow band spectrogram by specifying a 29 ms anal ysis window. Then a spectral slice (frequency by amplitude disp lay) was generated for the desired time point using an automatic feature of Praat (Boersma & Weenink, 2006). The frequency range 0-5500 Hz was selected for display. The loca tion of the first f our formants (or the desired formant or formant s) was determined by clicking on the estimated location of the formant, causing a cursor to appear at that point. The frequency value at the cursor was automatical ly obtained by the Praat (Boersma & Weenink, 2006) query procedure and pasted into the spreadsheet. The formant locations were determined by visually es timating the location of the peaks in the spectrum according to the method de scribed in Monsen & Engebretson (1983), in which a hypothetical triangle is created and superimposed over prominent harmonics and the peak of the triangle is adjus ted to the left or right to a position that would result in the harmonic amp litude relationship observed. All formant frequency measurements determined by hand were noted as such in
38 spreadsheet by each experimenter. Formant frequency measurements were converted to the Bark scale for stat istical analysis (Traunmller, 1990).
39 Chapter Three Results Four separate three-way mixed-de sign analyses of variance (ANOVAs) were performed on four dependent variables (see below). In each case, the between-subjects independent variable was talker group (three levels: MO, EB and LB) and the within-subjects independent variables were speaking style (two levels: conversational and clear) and target vowel (six levels: / i, e, , /). In each case, simple main effects post-hoc comparisons were used to explore significant effects and interactions. The following dependent variables were derived directly from the vowel measurements described above: vowel durati on (measured in ms), F1 (in Barks) at 50% of vowel duration and F2 (in Barks) at 50% of vowel. In addition, the twopoint vector length for F1-F2 frequencies fr om 20% of the vowel duration to 80% of the vowel duration was computed by fi nding the Euclidean distance (in Barks) between the F1-F2 frequencies at these tw o time points (cf. Ferguson & KewleyPort, 2002). These values were then used as the dependent variable in a fourth three-way mixed-design ANOVA. Note that the F0, F3 and F4 values for all target vowels, talker groups, and speaking styles are awaiting further analysis.
40 Vowel Duration Table 3. Statistical results for vowel durati on. Data on F values, degrees of freedom (df) and levels of significance (p values) for all main effects and interactions in the three-way ANOVA of t he effects of talker group, speaking style and target vowel on duration of target vowe ls. Significant effects are indicated by an asterisk. Effect F (df) p value Main effects Talker group .215 (2,32) .808 Speaking style 88.79 (1,32) <.001 Target vowel 125.08 (5,160) <.001 Two-way interactions Talker group by speaking style .47 (2,32) .631 Talker group by target vowel 4.70 (10,160) <.001 Speaking style by target vowel 4.00 (5,160) .002 Three-way interaction Talker group by speaking style by target vowel *2.81 (10,160) .003 Table 3 summarizes results for the three-way ANOVA on vowel duration. Significant main effects were found for speaking style and target vowel. The twoway interactions of talker group by ta rget vowel and speaking style by target vowel were significant. The three-way inte raction was also significant. F values
41 100 150 200 250 300 350 400 beadbidbayedbedbadbod Target wordVowel duration (ms) Conv Clear A : MO durationand p values for each significant effect ar e shown in Table 3. Only the three-way interaction will be discussed in detail bec ause it alters the other effects. Figure 1. Average durations (in ms) of tar get vowels for words produced in conversational and clear speech styles. MO= monolingual talkers (panel A); EB= early bilingual talkers (panel B); LB =late bilingual talkers (panel C). 100 150 200 250 300 350 400 beadbidbayedbedbadbod Target wordVowel duration (ms) Conv Clear C: LB duration 100 150 200 250 300 350 400 beadbidbayedbedbadbod Target wordVowel duration (ms) Conv Clear B: EB duration
42 Figure 1 shows mean vowel durations (in ms) for each speaking style and target vowel, with a separate panel for eac h talker group. As can be seen from the figure, the Â“long vowelsÂ” (in particular the vowels in the words Â“bead, bayedÂ” and Â“badÂ”) appear to be lengthened in clear speech more than their neighboring shorter vowels for the MO and EB talker groups (see Figures 1A and 1B). Thus, vowel durations are better distinguished fo r neighboring vowels in clear than in conversational speech for these two talk er groups. For the LB talkers, on the other hand, the vowels in Â“bayedÂ” and Â“ beadÂ” are lengthened less in clear speech less than their neighboring vowels, effect ively reducing the degree of inherent vowel differences in clear speech (see Figure 1C). Post-hoc tests comparing vowel du rations within each level of group and style confirm these observations. For the MO talker group, the vowels / / and / / did not differ significantly in duration in conversational speech (10 ms difference) but did in clear speech (20 ms diff erence). Although the duration difference between /i/ and / I / was significant in both styles, it increased from about 28 ms in conversational speech to about 66 ms in clear speech. For the EB talkers, the durat ions of the vowels /e, / and / / were all within 8 ms of one another and did not differ signifi cantly in conversational speech. In clear speech, / / was significantly longer than both /e/ and / / (by 20 and 27 ms, respectively). Furthermore, the differenc e in duration between the vowels /i/ and / I / increased from 22 ms to 48 ms from conversational to clear speech; the duration difference between /i/ and / I / was significant for both styles.
43 For the LB talkers, on the other hand, the durat ion difference between the vowels /e/ and / /, while significant in both styl es, decreased from 74 ms in conversational speech to 40 ms in clear s peech. Similarly, the vowel /i/ is 22 ms longer than / I / in conversational speech (a significant difference), but only 11 ms in clear speech (a non-significant di fference). Together, the vowel duration results show the MO and EB talkers em phasizing vowel duration differences between neighboring vowels in clear speech. The LB talkers show less differentiation in duration between nei ghboring vowels in clear than in conversational speech. F1 at 50% of Vowel Duration Table 4 summarizes results for the three-way ANOVA on F1 at 50% of vowel duration. A significant main effect was found for target vowel only. The two-way interactions of speaking style by talker group, talker group by target vowel and speaking style by target vowel were significant. The three-way interaction was not significant. F values and p values for each significant effect are shown in Table 4.
44 Table 4. Statistical results for F1 at 50% of vowel duration. Data on F values, degrees of freedom (df) and levels of significance (p values) for all main effects and interactions in the three-way ANOVA of the effects of talker group, speaking style and target vowel on the value of F1. Significant effects are indicated by an asterisk. Effect F (df) p value Main effects Talker group 1.59 (2,32) .219 Speaking style .07 (1,32) .790 Target vowel 607.30 (5,160) <.001 Two-way interactions Talker group by speaking style 3.60 (2,32) .039 Talker group by target vowel 9.01 (10,160) <.001 Speaking style by target vowel 2.61 (5,160) .027 Three-way interaction Talker group by speaking style by target vowel .90 (10,160) .534 Figures 2 and 3 show average Bark-frequency values of F1 (y-axis) and F2 (x-axis) at 50% of vowel duration for conversational (solid lines) and clear speech vowels (dashed lines). Each ta lker group is shown as a separate panel; data for the monolingual ta lker group are repeated in Figures 2 and 3 for easier comparison. Both axes are shown with values in reverse order, for better representation of jaw height and tongue position locations.
45 Figure 2. Average steady-state (50% of vo wel duration) F1 and F2 frequencies (in Barks) for vowels in conversati onal and clear speech (MO and EB talkers). MO= monolingual talkers (panel A); EB= early bilingual talkers (panel B). 2 3 4 5 6 7 8 9 10 11 12 8 9 10 11 12 13 14 15 16 17 18F2 (Barks)F1 (Barks) Mo Conv MO Clearbead bid bayed bed bad bod A : MO 50% 2 3 4 5 6 7 8 9 10 11 12 8 9 10 11 12 13 14 15 16 17 18F2 (Barks)F1 (Barks) EB Conv EB Clearbead bid bayed bed bad bod B: EB 50%
46 Figure 3. Average steady-state (50% of vo wel duration) F1 and F2 frequencies (in Barks) for vowels in conversati onal and clear speech (MO and LB talkers). MO = monolingual talkers (panel A); LB = late bilingual talkers (panel B). 2 3 4 5 6 7 8 9 10 11 12 8 9 10 11 12 13 14 15 16 17 18F2 (Barks)F1 (Barks) Mo Conv MO Clearbead bid bayed bed bad bod A : MO 50% 2 3 4 5 6 7 8 9 10 11 12 8 9 10 11 12 13 14 15 16 17 18F2 (Barks)F1 (Barks) LB Conv LB Clearbead bid bayed bed bad bod bid (clear) B: LB 50%
47 As can be seen from the figures, F1 va lues are slightly lower (indicating a higher tongue/jaw position in clear than in conversational speech for the vowels /i, I e/ and / / for the MO talkers (Figur e 2A) and for the vowels /i, I / and / / for the EB talkers (Figure 2B). For t he LB talkers (Figure 3B), only / / and / I/ show decreases in F1 from conversational to clear speech. The values for / / and / / on the other hand are higher in clear than in conversati onal speech for the LB talkers, as is that for / / for the EB talkers, indica ting a lowering of tongue/jaw position in clear speech. A comparison of Figures 2A and 2B shows only minor differences in F1 values between the MO and EB talker groups. The relative positions and distances between the vowels on the F1 axis are nearly identical for the two groups. A comparison of Figures 3A and 3B, however, shows quite noticeable differences in vowel location between the MO and LB talkers. The vowels /i/ and /e/ are located lower in the vowel space (higher F1) for LB than for MO (and EB) talkers. The vowels / I, / and / /, by contrast, are located higher in the vowel space (lower F1) for LB than fo r MO talkers. Thus, the maximum F1 distance between vowels appears to be reduced for the LB talkers, compared to the MO and EB talkers. The post-hoc comparisons for th e speaking style by talker group interaction revealed no significant speak ing style effects for any of the three groups; however, the MO talkersÂ’ F1 values were nearly significantly lower in
48 clear than in conversational speech (p=. 057), partially confirming the observation of lower F1 values in clear speech for certain vowels. For the LB talkers, there is a nearly significant increase in F1 val ues from conversational to clear speech (p=.079), partially confirming the higher F1 values observed in clear speech for / / and / /. Post-hoc analyses of the group by vowe l interaction showed significantly higher F1 values for LB than for MO and EB talkers for the vowels /i/ and /e/, confirming the observation of a lower pos ition in the vowel space for these vowels. LB talkers had significantly lowe r F1 values than MO and EB talkers for the vowels / I / and / / confirming the observation of a higher position in the vowel space for these vowels. Finally, LB talker s had significantly lower F1 values than EB talkers for the vowel / /, indicating a higher positio n in the vowel space. No significant differences in F1 values were found between MO and EB talkers. Post-hoc comparisons of individual vowelsÂ’ F1 values within each group showed all vowels to differ significant ly from one another fo r both the MO and EB talker groups. The order of the F1 val ues was also the same for these two groups. For the LB talkers, no significant difference in F1 frequency was found between /i/ and / I /. Otherwise, all of the vowels di ffered significantly in F1 for the LB talkers, and the order of the F1 values was the sa me as for the other two groups. Post-hoc analysis of the style by vo wel interaction showed a significantly lower F1 value in clear than in co nversational speech for the vowel / I / (indicating
49 a higher position in the vowel space) and a significantly higher F1 value in clear than in conversational speech for the vowel / / (indicating a lower position in the vowel space). No other vowels showed si gnificant differences between clear and conversational speaking styles. F2 at 50% of Vowel Duration Table 5 summarizes results for the three-way ANOVA on F2 at 50% of vowel duration. Significant main effect s were found for talker group, speaking style and target vowel. The two-way intera ctions of talker group by target vowel and speaking style by target vowel were significant. The three-way interaction was not significant. F values and p values for each significant effect are shown in Table 5. F2 values are shown along with F1 values for each talker group, target vowel and speaking style in Figures 2 and 3. An examination of Figure 2A shows t hat all of the MO talkersÂ’ vowels except / / have slightly higher F2 values (are slightly more fronted) in clear than in conversational speech. A similar but smaller pattern is shown for the EB talkers (see Figure 2B). Figure 3B shows this pattern for the LB talkers only for the vowels / I / and / /; however, the LB talkersÂ’ production of / I / is sufficiently fronted (and raised) in the clear speech st yle that it nearly completely overlaps with target /i/. A comparison of Figures 2A and 2B also shows higher F2 values for the EB talkers than for the MO talker s for all six of the target vowels, suggesting that all vowels are slightly mo re fronted for the EB talkers than for the MO talkers, regardless of speaking style.
50 Table 5. Statistical results for F2 at 50% of vowel duration. Data on F values, degrees of freedom (df) and levels of significance (p values) for all main effects and interactions in the three-way ANOVA of the effects of talker group, speaking style and target vowel on the value of F2. Significant effects are indicated by an asterisk. Effect F (df) p value Main effects Talker group 3.66 (2,32) .037 Speaking style 9.00 (1,32) .005 Target vowel 932.14 (5,160) <.001 Two-way interactions Talker group by speaking style 1.83 (2,32) .103 Talker group by target vowel 11.37 (10,160) <.001 Speaking style by target vowel 5.12 (5,160) <.001 Three-way interaction Talker group by speaking style by target vowel 1.35 (10,160) .210 Post-hoc comparison s of the vowel by gr oup interaction showed significant group differences for all of t he target vowels, but the order of the groupsÂ’ F2 values varied across target vowels. For /i, I / and / /, all three groups differed significantly from one another in thei r F2 values. For /i/, F2 values were significantly higher for EB talkers than fo r MO and LB talkers, and values for MO
51 talkers were significantly higher than those for LB talkers. These differences indicate a more front position in the vo wel space for the EB talkers than for the MO talkers and for the MO talker s than for the LB talkers. For / I / and / /, F2 values were significantly lower for the MO talkers than for the EB and LB talkers and lower for the EB talkers than for the LB talkers. These differences indicate a more back position in the vowel space for the MO talkers than for the EB talkers and for the EB talkers than for the LB talkers. For the vowels /e/ and / /, F2 values were significantly higher for the EB talkers than for the LB talkers, but the F2 values for the MO talkers did not differ significantly from those for either of the other two gr oups. Similar to /i/, these differences indicate a more front positi on for the EB than for the LB talkers. For the vowel / /, F2 values were significant ly lower for the MO group than for the LB group, but the F2 values for the EB talkers did not differ significantly from those for either of the other two groups. Similar to the results for / /, these differences indicate a more back tongue position for the MO than for the LB talkers. Overall, the group by vowel e ffect shows a smaller distance between the vowels /i/ and / / (most front vs. most back) for the LB talkers (/i// / distance = 4.2 Barks) than for the MO and EB talker groups (/i// / distance = 5.1 Barks). Post-hoc comparisons of individual vowelsÂ’ F2 values within each group showed all vowels to differ significant ly from one another fo r both the MO and EB talker groups. The order of the F2 val ues was also the same for these two
52 groups. For the LB talkers, no significant difference in F2 frequency was found between /i/ and / I / or between / I / and /e/. The other th ree vowels differed significantly in F2 from one another (and from /i, I / and /e/) for the LB talkers, and the order of the F2 val ues for these vowels was the same as for the other two groups. The F2 difference between / / and / / was about .8 Barks smaller for the LB than for the MO group; however, the F2 difference between / / and / / was about 1.3 Barks larger for the LB than for the MO group (due to the placement of / / higher in the vowel space for the LB talkers). Post-hoc analysis of the style by vowe l interaction showed a significantly higher F2 value in clear than in conv ersational speech for the vowels /i, I / and / / (indicating a more front position in the vowel space) and a nearly significantly lower F2 value in clear than in co nversational speech for the vowel / / (indicating a more back position in the vowel space) No other vowels showed significant differences between the clear and conversational speaking styles. Length of Vector from 20% to 80% of Vowel Duration Table 6 summarizes results for the three-way ANOVA on length of the vector in the F1-F2 space from 20% to 80% of the vowel dur ation. Significant main effects were found for speaking style and target vowel only. No interactions were statistically significant. F values and p values for each significant effect are shown in Table 6.
53 Table 6. Statistical results for two-point vect or length. Data on F values, degrees of freedom (df) and levels of significanc e (p values) for all main effects and interactions in the three-way ANOVA of t he effects of talker group, speaking style and target vowel on the value of t he Euclidean distance between F1-F2 frequencies at 20% and 80% of vowel duration. Signific ant effects are indicated by an asterisk. Effect F (df) p value Main effects Talker group 1.06 (2,32) .357 Speaking style 13.08 (1,32) .001 Target vowel 59.24 (5,160) <.001 Two-way interactions Talker group by speaking style .72 (2,32) .495 Talker group by target vowel .95 (10,160) .491 Speaking style by target vowel .88 (5,160) .494 Three-way interaction Talker group by speaking style by target vowel .42 (10,160) .934
54 Figure 4. Average F1 and F2 frequencies (in Ba rks) at 20% and 80% of vowel duration for vowels in conversational (black arrows) and clear (gray arrows) speech (MO and EB talkers). The arrowhea d indicates performance at 80% of vowel duration. 2 3 4 5 6 7 8 9 10 11 12 8 9 10 11 12 13 14 15 16 17 18F2 (Barks)F1 (Barks)bead bid bayed bed bad bod A : MO formant dynamic 2 3 4 5 6 7 8 9 10 11 12 8 9 10 11 12 13 14 15 16 17 18F2 (Barks)F1 (Barks) bead bid bayed bed bad bod B: EB formant dynamic
55 Figure 5. Average F1 and F2 frequencies (in Ba rks) at 20% and 80% of vowel duration for vowels in conversational (black arrows) and clear (gray arrows) speech (MO and LB talkers). The arrow head indicates performance at 80% of vowel duration. 2 3 4 5 6 7 8 9 10 11 12 8 9 10 11 12 13 14 15 16 17 18F2 (Barks)F1 (Barks)bead bid bayed bed bad bod A : MO formant dynamic 2 3 4 5 6 7 8 9 10 11 12 8 9 10 11 12 13 14 15 16 17 18F2 (Barks)F1 (Barks) bead bid bayed bed bad bod B: LB formant dynamic
56 Figures 4 and 5 show the vectors in t he F1-F2 space from 20% to 80% of the vowel duration for each target vowel in conversational (black lines) and clear (gray lines) speech. Figures 4A and 4B show the MO and EB talkerÂ’s results; Figures 5A and 5B show the MO and LB talk ersÂ’ results. The MO talkersÂ’ data are repeated in both figures for greater ease of comparison. An examination of Figures 4 and 5 re veals no dramatic differences in vector length between clear and conversati onal speech tokens for any of the talker groups. Vector length appears s lightly greater in clear than in conversational speech for / / and /e/ for the monolin gual talkers (see Figure 4A). For the EB talkers, vector length appea rs slightly greater in clear than in conversational speech for /e, / and / /. For the LB talk ers, vector length appears slightly longer in clear than in conversational speech for / , / and /e/, but to a lesser degree than for the ot her two groups. Overall, the modestly greater vector lengths in cl ear than in conversational speech are reflected in the significant effect of speaki ng style on vector length. Post-hoc comparisons of the main effect of vo wel showed significant differences in vector length among all of the vowels except between /i/ and /I/ and between /I/ and / /. The order of vowels from gr eatest to smallest vector length was as follows: / e, , I, i /.
57 Chapter Four Discussion Summary of Results Both the MO and EB talkers were found to emphasize vowel duration differences between neighboring vowels in clear speech, as compared to conversational speech. To achieve this greater differentiation between neighboring vowels, the MO and EB talker s lengthened the Â“long vowelsÂ” (/e, , i/) in clear speech more than shorter vowe ls. The LB talkers, on the other hand, lengthened the vowels /e/ and /i/ less in clear speech than they lengthened the shorter vowels. Thus, the LB talkers we re found to show less differentiation in duration between neighboring vowels in clear speech than in conversational speech. At 50% of vowel duration, the rela tive positions and distances between vowels on the F1 axis are nearly i dentical for the MO and EB groups. Conversely, the maximum F1 distance between vowels appears reduced for the LB talkers as compared to the MO and EB talkers. In clear speech, the MO talkers decreased the F1 of the high vowels (/e, I, i/) and increased the F1 of the lower vowels. In other words, the high vowels got higher and the low vowels got lower, so that the vowel space ex panded slightly on the F1 axis in clear speech. The EB and LB talkers did not reflec t an overall decrease in the F1 of all
58 high vowels and increase in the F1 of all low vowels. The LB talkers did, however, show a fairly sizeable lowering of F1 for low vowels in clear speech. In clear speech, the MO talkers in creased the F2 of the front vowels and decreased the F2 of the back vowel, agai n so that the vowel space expanded slightly on the F2 axis. The EB talkers did not increase F2 for all front vowels. This may be due to EB talkers being Â“more clearÂ” to begin with, so little to no increase in F2 is seen in performance. Rela tive to the MO talkers, the EB talkersÂ’ front vowels tended to be more fronted in conversational speech, so perhaps it would have been difficult for them to achi eve additional fronting of these vowels. The LB talkers also increased F2 slightly in clear speech (/ e / was the exception). Both the MO and EB talker groups appear ed to increase the length of the vector in the F1-F2 space from 20% to 80% of the vowel duration in clear speech for several vowels. The LB group showed a similar pattern, but differences were smaller in extent. Vector lengths appear ed to be largely comparable for the MO and EB talkers in both styles, except that the vector lengths for / / appeared shorted for the EB than for t he MO talkers in both styl es. Vector lengths for / / were appeared to be somewhat longer for the LB than for the EB talkers, but were shorter than those for the MO talker s. This cross-group difference in vector length for / / was apparently not consistent or large enough to result in a statistically significant effect.
59 Comparisons to Previous Studies Hillenbrand et al. (1995) found that vowels showed characteristic differences in the direction and degr ee of change in formant frequencies measured from 20 to 80% of vowel durat ion. Of the vowels examined here, Hillenbrand et al. (1995) found that / , e / and / / had the greatest degree of spectral change and / / and /i/ had the smallest. T he findings in this study showed that for monolingual talkers /e, / and / / had the greatest degree of spectral change and /i/ and / / had the smallest. For th e EB and LB talkers, the main difference was that the vectors for / / for these two groups were more comparable in length to those of /i/ and / / (short vectors) than to those of /e/ and / /. These between-group differences were apparently not lar ge or consistent enough to yield statistically significant differences between the groups, however. The steady state (50% point) frequen cy values appear to be in similar locations and spacing for the MO and EB talk ers as for the adult female talkers in Hillenbrand et al. (1995), except that / / is located lower in the vowel space in the present study than are the steady state values in Hillenbrand et al. (1995). The location of / / in the present study appears to be a better match with the steady state values of Peterson & BarneyÂ’s (1952) female talkers, as reproduced in Hillenbrand et al. (1995), except that / / appears to be located lower in the
60 vowel space than / / for the talkers in the present study, whereas the two vowels are of approximately equal height in t he Peterson & Barney (1952) data. Ferguson & Kewley-Port (2002) exam ined formant frequency measures, degree of spectral change, and duration fo r ten target vowels produced in conversational and clear speech style by a single native speaker of American English. They found that in clear s peech, F1 increased for all ten vowels. Conversely, the findings in the pres ent study showed that F1 increased significantly for only / / and / / (for the monolingual ta lkers). The findings of Ferguson & Kewley-Port (2002) were similar to those for the present study in that F2 increased in front vowels (/e, , I, i/) and F2 decreased in the back vowel (/ /) in clear speech. In addition, in bot h studies the vowel space increased in clear speech for the monolingual talkers. Ferguson & Kewley-Port (2002) found that vector length in the more crowded areas of the talkerÂ’s vowel space was significantly greater in clear than in conversational speech. Of the vowe ls examined in the current study, / , / and /e/ did show slightly greater vect or lengths in clear speech than in conversational speech (with some variat ion across talker groups). An overall significant positive effect of speaking st yle on vector length was also seen in the present study. When examining duration, Ferguson & Kewley-Port (2002) found that all vowels were significantly longer in clear speech. Similarly, the results of the
61 present study also showed a significant positive effect of clear speech on duration of vowels for all three talker groups. Limitations and Implications for Future Research A limitation in this study is that only six vowels were studied. Future research should include all monophthonga l vowels. It should be noted, however, that everything measured in this study wa s not analyzed. Theref ore, there is data that has been collected but not yet analyzed. Specifically, F0, F3, and F4 values for all target vowels, talker groups, and speaking styles are awaiting further analysis. In addition, data on spectral t ilt may be gathered from this study. Another limitation is that only ANO VAs were completed for this study. Ideally, a multivariate analysis of va riance (MANOVA) should be completed. For example, the effects and interactions among the independent variables found in the present study might show different patterns when their effects on the relationships among the dependent va riables are also explored. Future research using these data should also include a correlational analysis between the acoustic variables and the intelligibility and degree of clear speech benefit shown for each talker. I ndividual differences across talkers in each group could be correlated with the ac oustic measures from this study to determine which strategies used in clear speech result in the greatest intelligibility benefit. Conclusion One practical implication of this st udy for the speech-language pathologist (SLP) is the incorporation of these result s for use in accent modification therapy
62 for Spanish-English bilinguals. The tendency of the LB talkers not to emphasize duration differences between neighboring vo wels during clear speech suggests that they may be unaware of or unable to actively manipulate these differences. These differences might be drawn to t he learnerÂ’s attention during accent reduction therapy. In addition, the location of the vowels / I / and /i/ were located very closely to one another in the vowel space of t he LB talkers. In the clear speech condition, the distinction between / I / and /i/ for the LB group was essentially nonexistent. In fact, the LB talkers tended to crowd all four of the high to mid front vowels. Training Spanish-English bilingua ls to better differentiate high to mid front vowels in production could be highly beneficial in improving their intelligibility. Possible approac hes to this training includ e the use of visual aids, indirect feedback in the form spectral di splays of recorded vowels in the F1-F2 space, or direct articulatory feedba ck from ultrasound analysis of tongue position during vowel production.
63 References Boersma, P., & Weenink, P. (2006). Praat: doing phonet ics by computer (Version 4.4.24) [Computer program]. Retrieved June 19, 2006, from http://www.praat.org/. Bohn, O.-S., & Flege, J. (1997). Perc eption and production of a new vowel category by adult second language lear ners. In: A. James, & J. Leather (Eds.), Second-language speech: Structure and process (pp. 53-73) Berlin: Mouton de Gruyter. Bradlow, A. R. (1995). A co mparative acoustic study of English and Spanish vowels. Journal of the Acoustical Society of America, 97, 1916-1924. Bradlow, A. R., & Alexander, J. A. (2007). Semantic and phonetic enhancements for speech-in-noise recognition by native and non-native listeners. Journal of the Acoustical Society of America, 121, 2339-2349. Bradlow, A. R., & Bent, T. (2002). The clear speec h effect for non-native listeners. Journal of the Acoustical So ciety of America, 112, 272-284. Bradlow, A. R., Kraus, N., & Hayes, E. (2003). Speaking clearly for children with learning disabilities: Sentence perception in noise. Journal of Speech, Language, and Hearin g Research, 46, 80-97. CoolEdit 2000 (version 1.1) [computer software] (2000). Syntrillium, Inc.: Phoenix, AZ.
64 Dalbor, J. B. (1969). Spanish pronunciation: Theory and practice New York, NY: Holt, Reinhart and Winston. Ferguson, S. H. (2004). Talker differenc es in clear and conversational speech : Vowel intelligibility fo r normal-hearing listeners. Journal of the Acoustical Society of America, 116, 2365-2373. Ferguson, S.H. & Kewley-Port, D. (2002) Vowel intelligibility in clear and conversational speech for normalhearing and hearing-impaired listeners. Journal of the Acoustical Society of America, 112, 259-271. Flege, J. E. (1991). The interlingual ident ification of Spanish and English vowels: Orthographic evidence. The Quarterly Journal of Experimental Psychology, 43 701-731. Flege, J.E. (1995). Secondlanguage speech learning: Theory, findings, and problems. In: W. Strange (Ed.), Speech perception and linguistic experience: Issues in cross-language research (pp. 233-277). Baltimore, MD: York. Flege, J. E., Bohn, O.-S., & Jang, S. ( 1997). Effects of exper ience on non-native speakers' production and perc eption of English vowels. Journal of Phonetics, 25, 437-470. Flege, J. E., Schirru, C. & MacKay, I. R. A. (2003). Interacti on between the native and second language phonetic subsystems. Speech Communication, 40 467-491.
65 Helfer, K. S. (1997). Auditory and auditory-visual perception of clear and conversational speech. Journal of Speech Language and Hearing Research, 40, 432-443. Hillenbrand, J., Getty, L. A., Clark, M.J. & Wheeler, K. (1995). Acoustic characteristics of American English vowels. Journal of the Acoustical Society of America, 97, 3099-3111. Hillenbrand, J. & Nearey, T.M. (1999). Identificati on of resynthesized /hVd/ utterances: Effects of formant contour. Journal of the Acoustical Society of America, 105, 3509-3523. Johnson, K., Flemming, E. & Wright, R. (1993). The hyperspace effect: Phonetic targets are hyperarticulated. Language, 69, 505-528. Kewley-Port, D., Akahane-Yamada, R., & Aikawa, K. (1996). Intelligibility and acoustic correlates of Japanese accented English vowels. ICSLP 96 (International Conference on Sp oken Language Processing) Proceedings. Krause, J.C. & Braida, L.D. (2002). Inve stigating alternative forms of clear speech: The effects of speaking rate and speaking mode on intelligibility. Journal of the Acoustical Society of America, 112, 2165-2172. Ladefoged, P. (1982). A course in phonetics, 2nd ed. (Harcourt, Brace, Jovanovich: New York). MacKay, I.R.A. & Flege, J.E. (2004). Effects of the age of second language learning on the duration of first and second language sentences: The role of suppression. Applied Psycholinguistics, 25, 373-396.
66 Monsen, R. B., & Engebrets on, A. M. (1983). The a ccuracy of formant frequency measurements: A comparison of spectrographic analysis and linear prediction. Journal of Speech and Hearing Research, 26, 89-97. Payton, K. L., Uchanski, R. M. & Braida, L. D. (1994) Intelligibility of conversational and clear speech in noi se and reverberation for listeners with normal and impaired hearing. Journal of the Acoustical Society of America, 95, 1581-1592. Peterson, G. & Barney, H. (1952). Cont rol methods used in a study of the vowels. Journal of the Acoustical Society of America, 24, 175-184. Picheny, M., Durlach, N. & Braida, L. (1985a). Speaking clearly for the hard of hearing I: Intelligibility differences between clear and conversational speech. Journal of Speech and Hearing Research, 28, 96-103. Picheny, M., Durlach, N., & Braida, L. (1985b). Speaking clearly for the hard of hearing II: Acoustic characteristics of clear and conversational speech. Journal of Speech and Hearing Research, 29, 434-446. Piske, A., MacKay, I., & Flege, J. (2001) Factors affecting degree of foreign accent in an L2: A review. Journal of Phonetics, 29, 191-215. Rogers, C. L., Dalby, J. M., & Nishi, K. (2004). Effects of noise and proficiency level on intelligibility of Chinese-accented English. Language and Speech, 47, 139-154. Roland VS-890 manual. Appendic es page 130. Strange, W., Jenkins, J. J., & Johnson, T. L. (1983). Dynamic spec ification of coarticulated vowels. Journal of the Acoustical Society of America, 74, 695-705.
67 Schum, D. J. (1996). Intelli gibility of clear and conversational speech of young and elderly talkers. Journal of the Academy of Audiology, 7, 212-218. Strange, W., Akahane-Yamada, R., Kubo, R., Trent, S.A., Nishi, K., & Jenkins, J.J. (1998). Perceptual assimilation of American English vowels by Japanese listeners. Journal of Phonetics, 26, 311-344. Traunmller, H. (1990). Anal ytical expressions for t he tonotopic sensory scale. Journal of the Acoustical Society of America, 88, 97-100. United States Census Bureau (2000). 2000 U.S. Census. Retrieved June 25, 2007, from http://www.factfinder.census.gov.