Duration as a Phonetic Cue in Native and Non‐Native American English

This vowel study looks at the intricate relationship between spectral characteristics and vowel duration in the context of American English vowels, both from a native speaker (NS) and non-native speaker (NNS) perspective. The non-native speaker cohort is homogeneous in the sense that all speakers have Serbian as their mother tongue, but have been long-time residents of the US. The phonetic context investigated in this study is /bVt/, where V is one of the American English monophthongs /i ɪ u ʊ ε æ ʌ ɔ ɑ/. The results of the acoustic analysis show that the NNS vowels are generally longer than the NS vowels. Furthermore, NNSs neutralise the vowel quality of two tense and lax pairs of vowels, /i ɪ/ and /u ʊ/, and rely more heavily on the phonetic duration when prononuncing them.


Introduction
The vowels of American English (AE) differ in their durations, and the different durations are often said to be phonetically realised as tense or lax. Even though vowel duration in American English has been thoroughly explored, it still offers a fine terrain for further research in several different niches. First, there is intrinsic duration that is studied by Black (1949) and Lehiste and Peterson (1961), who found that open vowels are longer than the vowels produced with a more close jaw opening. To illustrate this point, we can say that the American English vowel /ɑ/, that is more open, is intrinsically longer than the close vowel /i/. Another important and well-investigated phonetic characteristic of English vowels pertains to the duration that is brought into connection with the voicing of the following consonant (House and Fairbanks 1953;House 1961;Chen 1970). The vowel of bead is realised as longer than the vowel of beat, for instance. The lack of voicing in the following consonant shortens the vowel duration, all else being equal.
Second, some extralinguistic factors may influence vowel duration: vowels produced by female speakers are generally longer than the vowels produced by male speakers (Hillenbrand et al. 1995;Holt, Jacewicz, and Fox 2015). Sociolinguistic influences may also have impact on vowel duration in English. In a recent study of Southern African-American English compared to White American English, Holt, Jacewicz and Fox (2015) claim that African-American speakers in the same geographical area as White Americans produce longer vowels.

Earlier Research on the Duration of American English Vowels
One of the early studies that aims at establishing the relationship between frequency, intensity and duration of vowels characteristic of the reading style in American English was carried out by Black (1949). We will here only point to the results that Black (1949) offered in relation to the phonetic duration. A sample of 16 male speakers was analysed in a voiceless phonetic environment, where 11 English vowels were recorded in isolation at a specific pace in the context where the vowel is preceded by /t/ and followed by a voiceless /p/, in short /tVp/. This article provides a good comparison ground against the corpus specifically designed for the purposes of the current study that utilises a similar phonetic context, namely /bVt/. Table 1 provides the averages of the phonetic duration for 9 1 vowels, which are read from a magnetic tape. 1 We only give data for the 9 AE vowels that are the subject of the current study.
Duration as a Phonetic Cue in Native and Non-Native American English THE SOUNDS OF ENGLISH Peterson and Lehiste (1960) studied the influence of the consonants that precede and follow an AE vowel (a monophthong or diphthong). They used two sets of data, and here we will present the measurements only of the larger set that involves 30 monosyllabic minimal pairs and 10 additional disyllabic minimal pairs (Peterson and Lehiste 1960, 693-94). Table 2 systematises data for the vowels that are the subject of the current study. Table 2. Mean values of vowel duration produced by five speakers (adapted and taken from Peterson and Lehiste (1960, 702) Peterson and Lehiste (1960) delved further into every individual consonantal context looking at the magnitude of influence of the preceding and following consonants. They arrived at the conclusion that the impact of the preceding consonant on the duration of the syllable nucleus is negligible, but the following consonant has a significant impact on the duration of the vowel. Table 3 provides the mean durations of short and long nuclei in different consonantal contexts. We have singled out only the stop consonants that close the syllable in question. It is evident that the ratio of the duration of vowels in front of voiceless and voiced consonants is approximately 2:3 for American English in favour of voiced segments. The importance of Peterson and Lehiste's research study lies in the fact that they systematically examined all phonetic environments and came to a robust conclusion that voicing is an important cue in distinguishing long and short vowels of AE. Table 3. Mean values of short and long nuclei as a function of English stops (adapted and taken from Peterson and Lehiste (1960, 702) House's study (1961) contributes to the overall phonetic research in the sense that he investigated vowel durations across phonetic contexts, similar to Peterson and Lehiste (1960). Each of the 12 AE vowels is followed either by a voiced or voiceless consonant (three stops, one affricate and three fricatives) in the speech of three male talkers. The vowel speech sound occurs in a stressed syllable of a disyllabic nonsense word. House (1961) provides duration ratios for different phonetic contexts (voiced vs. voiceless consonants that affect vowel durations), different characters of vowels (tense vs. lax), and compares cumulative durations for groups of vowels (close/mid/open tense or lax) in different consonantal contexts (stop, affricate, fricative). House (1961House ( , 1176 finds that there is a systematic progression in vowel durations depending on vowel features and phonetic environment. Table 4 gives an overview of values resulting from his study. Table 4. Mean duration in ms (rounded to the nearest 10 ms) after voiceless consonants (taken and adapted from House (1961House ( , 1176).

Tense Lax
Close 150 120 Mid 170 Open 220 150 The longest vowel duration is observed in open tense vowels followed by voiced fricatives -400 ms (stops shorten the vowel duration most, all else being equal, and affricates are positioned in between these two classes of consonants). However, close lax vowels followed by a voiceless stop are characterised by the shortest duration that is only one quarter of that mentioned above, at 100 ms. Table 5. Mean duration in ms (rounded to the nearest 10 ms) after voiced consonants (taken and adapted from House (1961House ( , 1176).

Tense Lax
Close 320 220 Mid 350 Open 360 250 Hillenbrand, Clark and Houde (2000) came to interesting findings in their study of the effects of duration on vowel recognition in American English. The pairs of vowels including /i-ɪ/, /u-ʊ/ and /ɪ-e-ε/ are minimally affected by duration because their spectral features are different enough to distinguish between them, unlike /ɑ-ɔ-ʌ/ and /e-ae/ that are significantly affected by variable duration.
This paper looks into the vowel duration as a cue that plays a role in distinguishing between American English vowels as produced by two groups of speakers, native and non-native. To the best of our knowledge, this is the first study of its kind that compares the American English spoken by Serbian expatriates with native speakers of this variety. The Serbian vowel inventory is traditionally described as one of the commonest vowel systems: a five-vowel system that comprises /i e a o u/. These five vowels are combined with four pitch accents (long and falling, long and rising, short and falling, and short and rising). Some more recent approaches, starting with Jakobson (1937Jakobson ( [1962), propose a novel approach to the Serbian vocalic system, where quantity and pitch are factored out as two distinct dimensions. Such a view was adopted in several other studies on Serbian pitch accent (Browne andMcCawley 1965, Inkelas andZec 1988). We will adopt the latter approach in this vowel study and regard Serbian as a quantity 19 THE SOUNDS OF ENGLISH language following Lehiste (1970), who claims that short and long vowels may also differ in their spectral characteristics (see also Čubrović 2016, 26-29).

Participants
Ten native speakers of Serbian who live in the United States and five native speakers of American English took part in the experiment. All ten participants are male.
At the beginning of the recording session, each participant was required to fill in a questionnaire. The Serbian participants were asked to report the length of residence (LOR) in the United States and language(s) spoken at home. The Serbian participants were also asked to rate their own English fluency on a scale (1-5, 5 being the highest) at the time of relocation from Serbia and at the time of the recording in the States. All ten participants were born in Belgrade, Serbia (except for one participant who was born in the south of Serbia, but lived in Belgrade for 27 years prior to moving to the US), and continued to live in Belgrade until they moved to the States. They all live in Atlanta, GA, and their age ranges from 35-44. Nine of them had lived in Atlanta for more than 12 years at the time of the recording. Seven out of ten speakers mostly speak Serbian at home, and all participants use exclusively English at work.
Native speakers of English were asked to report on their place of residence and languages spoken. All five lived in the North-East of the United States at the time of the recording. Three were undergraduate students at Cornell University, Ithaca, NY, and two were employees (former and present) of the same university. Table 6 summarises this information.

Materials and Recording Procedures
The acoustic experiment targets nine vowels of American English (AE) in the following monosyllabic words: beat, bit, bet, bat, but, boot, put, bought and pot. The words were all embedded in the frame sentence "Say ___ again", and repeated three times in a random order, giving a total of 270 (10 speakers x 3 repetitions x 9 vowels) tokens for Serbian NSs and 135 (5 speakers x 3 repetitions x 9 vowels) tokens for English NSs, totalling 405 repetitions.
All NNS recordings were made using Sennheiser noise-cancelling headphones and a Sony laptop computer running Praat, Version 5.3.51 (Boersma and Weenink 2013). The NSs of American English were recorded in a sound-attenuated booth in the Cornell University Phonetics Lab. Participants were given sets of sentences in a Power Point presentation, and only one sentence was presented on a slide at a time. They were also given the opportunity to familiarise themselves with the sentences before the recording started. After they had got acquainted with the materials, the participants were instructed to read the sentences "as naturally as possible".

Analysis and Discussion
The recordings were digitised at 22,000 Hz and analyzed using the Praat software for acoustic analysis of speech (Boersma and Weenink 2013). All elicited materials were first manually labelled and vowel segmental acoustic features measured with the use of a script written by DiCanio (2013). This script generated eight acoustic measures: vowel duration, F1, F2, F3, centre of gravity, standard deviation, skewness, and kurtosis. Even though duration is in the focus of the present study, F1 and F2 will also be shown in order for vowels to be fully analysed.

High Vowels /i ɪ/
The first pair of vowels are those of beat and bit. In AE, they are most often described as tense and lax, respectively. NSs clearly differentiate them by vowel quality, which is shown in Figure 1. NNSs have a tendency to merge /i/ and /ɪ/. This phenomenon may be accounted for by the fact that Serbian language background speakers rely heavily on the phonetic duration when distinguishing between these two vowels. They transpose this phonetic property from L1 (Serbian) into L2. The merger is not observed in speaker SG, who has the longest LOR in the States (23 years). Similarly, the acoustic characteristics of /ɪ/ of speaker MS, whose LOR is 15 years, approximate the NSs production of this vowel.

THE SOUNDS OF ENGLISH
The duration measurements for the native and non-native participants are looked at next. The mean value for the tense vowel /i/ in the native speaker group is 112.5 ms (SD 15.1 ms). Its lax counterpart has a mean value of 88.6 ms (SD 14 ms) in the same group of participants. The non-native speakers' tense vowel is significantly longer at 132.51 ms and with a larger SD (30.05 ms). The non-native speakers' lax vowel duration is only marginally different from the native speakers' average duration measurement at 87.6 ms (SD 20.74 ms).
It seems worth noting the standard deviations that are significantly higher in the nonnative speaker group, which points to vowel duration instability in this participant group.

High Vowels /u ʊ/
The next pair of vowels are the vowels of boot and put. In AE, /u/ is tense and /ʊ/ is lax. NSs clearly separate the two vowels in the vowel area, which is shown in Figure 3. For native speakers of English there is no overlapping of /u/ and /ʊ/ when vowel quality is concerned. NNS vowels are realised differently and their tendency to merge the two vowels is reported in this vowel study. The nonnative speaker participants consistently produce F1 of /ʊ/ with a lower frequency. This makes the lax /ʊ/ a higher vowel and closer to /u/ in the NNS group. NNSs obviously disregard the quality difference between the two English vowels and rely more on the phonetic duration, similar to the /i/-/ɪ/ pair, which is displayed in Figure 4 below.
As for the duration measurements, NS tense /u/ is on average 125 ms long (SD 15.8), while its lax counterpart is 73.8 ms long (SD 9.9 ms). NNSs tend to produce the tense vowel as even longer. The mean value of /u/ in the NNS group amounts to 154.5 ms (SD 35.9 ms). The lax /ʊ/ is realised as significantly shorter, and its average value in the NNS group is 80.3 ms (SD 12). To conclude, NNSs seem to rely on the phonetic duration when producing the AE tense /u/. The SD for the 23 THE SOUNDS OF ENGLISH NNS vowel is also larger when compared to the NS realisation of this vowel which points to the vowel duration instability and hesitation on the part of the NNSs.

The Vowels /ε ae ʌ/
In the next group of vowels, we first investigate the phonetic characteristics of two vowels /ε/ and /ae/. It has been noted that there is more variation in the acoustic vowel space for /ε/ and /ae/ even in the group of NSs. There is a general tendency, though, for /ε/ to be produced with a lower F1, which makes it a higher vowel than /ae/ in AE native speech. The three tokens of all three vowels that have consistently lower values of F1 are all produced by speaker MB. This may be due to his vocal tract length, which can be longer in tall people. This reduction in F1 values in one speaker may be seen as his idiosyncratic characteristic.
In order to provide a comprehensive overview of AE monophthongs in this paper, the vowel /ʌ/ is also displayed on Figure 5, but it clearly does not overlap with /ε ae/ in the native speaker group.
The NNS participants in this study tend to merge /ε/ and /ae/, i.e. they do not clearly differentiate between the two. The vowel /ae/ is a new sound to Serbian language speakers. A similar finding can be observed in other Slavic languages. Slovenian learners of English, for example, find this vowel contrast in BE the most challenging (Stopar 2015, 89;Komar 2017, 163). However, there are nine tokens of the bat vowel in Figure 6 that have higher F1 values and they are repetitions of three speakers, SG, NN and UZ. Two of these participants have a relatively long LOR in the States, of 23 and 16 years. The speaker NN has lived in the States for 13 years, which is slightly under the mean value for LOR in the study (13.5 years). These nine tokens of bat approximate the NS spectral characteristics.  As for vowel duration, the three English vowels are given here in ascending order, from the shortest to the longest: /ε ae ʌ/. The average values for the NS vowels are consistently shorter than for the NNSs' ones. As expected, the standard deviations are larger in the NNS group. All duration measurements for the three vowels are given in Table 7.  Figure 5. NS /ε ae ʌ/.

THE SOUNDS OF ENGLISH
3.3.4 The Vowels /ɔ ɑ/ Both /ɔ/ and /ɑ/ are described as back vowels in AE. Neutralisation of these two vowels is observed in many regions of the US and Canada, and is known as cot-caught merger. As a result of this phonological process, the two vowels become one (see Čubrović 2017; 2018).
The NS participants of this study mostly differentiate the vowels in question, i.e. the vowels in the words bought and pot. The NNSs utilise the same speech habit as NSs. However, the /ɔ/ vowel is articulated with a lower F2 by most subjects in the NNS cohort. This implies that the NNS /ɔ/ is a more back vowel than in native AE speech.
The vowel of pot shares the same phonetic characteristics as /ɑ/ in the two groups of participants of this study. The values for both formants are lower in NNSs. This vowel is, therefore, produced as a higher vowel and with a greater degree of backness, see A brief look at the duration measurements provides an interesting finding: the vowel of bought is significantly longer in the NNS group at the mean value of 199.5 ms (SD 56.9 ms) compared to the NSs' mean value of 156 ms (SD 14 ms). This is consistent with the findings for the tense /i u/, where NNSs also rely heavily on vowel length.
The vowel of pot is shorter in the NNS group (117.3, SD 23) than in the NS group of participants (123.2 ms, SD 13.7), which runs counter to the general rule that NS vowels are consistently longer compared to NNS vowels.

General Discussion and Conclusions
The tables that follow summarise the mean values of the vowel duration for all nine vowels investigated in this research study in both NS and NNS groups. The non-native speaker group data is given first, followed by the average duration measurements for the native speaker group. The third row displays the duration ratio for the two groups of participants (NNS vowel duration/NS vowel duration). The vowel duration ratio analysis shows that NNS vowels are consistently longer in duration than their NS counterparts, with the exception of /ɪ/ and /ɑ/. This is also shown in Figures 9 and 10. ɔ ɔ ɒ ɪ ʊ Figure 9. NS vowel duration with SD.
One of the significant findings of this small-scale vowel study is the neutralisation of vowel-quality of several vowels in the NNS production, which leads to a heavy 27 THE SOUNDS OF ENGLISH reliance on phonetic duration as a single, most important phonetic cue. The vowels /i/ and /u/ are reported to undergo a sort of a spectral merger in the NNS production, therefore NNS have to rely more heavily on the phonetic duration. It is a matter of debate why NNS articulate /ɔ/ as a very long vowel in AE. One of the reasons might be the influence of British English that these speakers were taught at school, where /ɔ:/ and /ɒ/ form a vowel pair, similar to /i: ɪ/ and /u: ʊ/. Last but not least, the open vowel /ae/ is intrinsically long so both native and nonnative speakers produce it this way. As a new sound to the NNS group, its vowel quality is somewhat more difficult to acquire, but its universal vowel duration is a good place to start.