Trainee Translators’ Perceptions of the Role of Pronunciation and Speech Technologies in the Technology‑Driven Translation Profession

We live in a world of rapid technological advances which constantly affect the work of professional translators. Suitable training is therefore required for future translators to be able to compete on the translation market. With the rise of translation technologies, new ideas have been put forward on how to make translators faster and more efficient. Among the technologies that future translators may not be adequately familiar with are speech recognition tools; these enable translators to dictate their sight translation and have it typed out, allowing more time to focus on the content. However, as with all digital tools, the quality of input is important; a question thus arises on the role pronunciation assumes in such work. The present study aimed to establish how much awareness there is amongst the trainee translators of the possibilities afforded by speech technologies and to explore their perceptions of the role played by pronunciation.


Introduction
The impact of new technologies on translation work over the last few decades has significantly changed the way people perceive the work of professional translators. The usual translator's workstation or translator's workbench no longer involves working only with computers and computer-assisted (CAT) tools, but may, under certain conditions, also involve working with machine translation (MT) and speech recognition technologies. According to a Stanford study (cf. Weiner 2016 1 ) speaking is much faster than typing on a touchscreen, while typing on a computer keyboard is seemingly easier and faster. However, even a few years ago speech recognition software was criticised due to its error-prone performance which inevitably lead to spending too much time correcting the mistakes. It therefore seemed reasonable to assume that professionals who use a keyboard as part of their daily routine, translators included, would not be inclined to integrate into their work technologies which actually slow them down. However, a lot has changed since then: Nuance has produced Dragon Speech Recognition software, one of the leading speech recognition technologies, and claims that it is now able to transcribe up to 160 words per minute, which is also about three times faster than typing, with an enviable 99% recognition accuracy (cf. Dragon NaturallySpeaking 2 ). This suggests speech technologies are now much more effective, and can perhaps make translation work more efficient. Moreover, any technological advantage is worth exploring to ensure that professional translators remain competitive on the translation market.
With the swift rise of digital innovations and artificial intelligence (AI), significant endeavours will constantly, and increasingly so, be put into speech technologies for translation undertakings, at least for fairly basic communication purposes and simple translation tasks, with the aim to establish basic contact and ease communication for those who do not speak a particular language. Students might already be aware of the possibilities afforded by virtual AI speech assistants such as Amazon's Alexa, Microsoft's Cortana, Google's Assistant or Apple's Siri, and might have tried using such services. Large brands are all investing heavily into voice technologies, and they are associated with a growing number of applications (cf. for more details on virtual assistants see Moren 2018). Armour (2018) reports on the data provided by Adobe Analytics, which indicates that "71% of owners of smart speakers like Amazon Echo and Google Home use voice assistants at least daily" [...] with "44% using them multiple times a day" while "[o]ver 76% of smart speaker owners increased their usage of voice assistants in the last year". Armour (2018) also quotes Steve Rabuchin, VP of Amazon Alexa, who stated that the vision they have for their customers is to "be able to access Alexa whenever and wherever they want. This means customers may 1 Cf. https://www.popularmechanics.com/technology/a22684/phone-dictation-typing-speed/.
2 Compare with data provided by Nuance at https://www.nuance.com/dragon/industry/education-solutions.html. Trainee Translators' Perceptions of the Role of Pronunciation and Speech Technologies ... be able to talk to their cars, refrigerators, thermostats, lamps and all kinds of devices in and outside their homes". Armour (2018) believes that "voice is the future of how brands will interact with their customers". These virtual assistants are all monolingual, however, and do not engage in multi-lingual communication. Even so, "[t]o build a robust speech recognition experience, the artificial intelligence behind it has to become better at handling challenges such as accents and background noise. And as consumers are becoming increasingly more comfortable and reliant upon using voice to talk to their phones, cars, smart home devices, etc., voice will become a primary interface to the digital world and with it" (Armour 2018).
Virtual assistants no longer work only with English 3 ; Cortana, for example, is currently also available in Chinese, French, German, Italian, Japanese, Portuguese and Spanish versions, making these voice technologies increasingly accessible to a much wider audience 4 . Even regular dictation services available to Windows and Mac users have the option of choosing between language varieties, with American, Australian, British or Canadian English, for example, already embedded while, depending on the tool, other varieties can easily be downloaded from the Internet. However, more time may be required to have languages of lesser diffusion 5 successfully integrated into existing systems. Slovene is a language spoken by only about two million people, and thus is less likely to be automatically added to other major language options. However, there are some speech recognition tools available, such as Voice Notepad, which already have Slovene embedded, and the dictation performance is relatively accurate. This is in contrast to the Google Translate dictation option, as the quality of translation is often still highly questionable and the final output more frequently than not inadequate and unusable. There is even a virtual AI assistant SecondEGO, designed by Amebis 6 , and several other systems available for Slovene, which were originally created on the basis of large corpora and other language resources 7 , such as the speech-to-speech communicator VoiceTRAN 8 or eBralec 9 (eReader): the direction, however, is speechto-speech or written to spoken rather than spoken to written, which would be most suitable for translators. Moreover, these technologies are only available commercially or for research purposes (cf. Sepesy Maučec et al. 2009;Donaj and Kačič 2012;Žgank and Sepesy Maučec 2010;Žgank, Verdonik, and Sepesy Maučec 2016, to name just a few), while their non-commercial availability is still a matter for the future.  For more on English and its relative share online see Holly Young's article available at http://labs.theguardian. com/digital-language-divide/ and Laura Gonzales' article available at http://uxpamagazine.org/improving-digitaltranslation/.  Still, none of these technologies are directly applicable to regular translation work as they are aimed at the general public to ease their daily routines. None of the virtual assistants are applicable to ease the tedious task of typing which has to be regularly undertaken by translators; translators thus need more specialised translation tools to facilitate their work (cf. Cronin 2013). One option that could possibly aid their daily routines and reduce the need for constant typing is dictation. Combined with sight translation it could change the way translation is habitually performed. It might thus be worth investigating the usability of speech-to-text technologies in translator training, foregrounding the time-efficiency ratio in particular. The awareness of trainee translators of the role of pronunciation and their familiarity with speech recognition technologies deserve research attention, in order to establish whether the application of such technologies could be motivating and beneficial for future translators.

Literature Review
Professional translation work is usually associated with the written output. However, the spoken modality should not be neglected in today's information society and its digital world, so heavily imbued with multimodality. It is therefore worth exploring the issues in translator training that address these modalities, spoken included, especially since -within the scope of interpreter training - Shlesinger (1995, 193-214) already maintained that "one modality can teach us about the constraints, conventions and norms of the other". This suggests that sight translation, a bridge between the oral and written mode of translation (cf. Agrifoglio 2004), should perhaps play a more prominent role not only in professional translation, but also in translation pedagogy.
So far, sight translation has been recognised as relevant in interpreting studies and interpreting pedagogy (cf. Agrifoglio 2004;Angelelli 1999;Li 2014;Gile [1995] 2009; Gonzalez, Vásquez, and Mikkelson 2012;Jimenez Ivars 2008;Lambert 2004;Mikkelson 1994;Moser-Mercer 1995;Pöchhacker 2004Pöchhacker , 2010Riccardi 2002;Schlesinger 1995;Song 2010;Viaggio 1995;Viezzi 1990;Weber 1990). Although there is still a fairly small body of literature focusing on the advantages of sight translation for written translation (cf. Baxter 2016; Dragsted, Hansen, and Sørensen 2009;Dragsted, Mees, and Hansen 2011;Gorszczyńska 2010;Mees et al. 2013), a recent study has shown (cf. Hirci, Mikolič Južnič, and Pisanski Peterlin forthcoming) that engaging in sight translation for the purposes of written translation can result in creative, novel translation solutions, which gives an added value to the translation process and can make the entire process of translating much faster and more efficient. Some scholars have already explored the application of dictation in sight translation and foregrounded its benefits for translation work in terms of time efficiency (cf. Biela-Wolonciej 2007). Possible advantages were also reported by Dragsted, Mees, and Hansen (2011), who compared written and sight translation output with and without speech recognition software.
They concluded that with additional training and better familiarity with speech recognition tools, "greater time savings and higher quality are likely to be achieved as technical obstacles are either reduced or overcome" (Dragsted, Mees, and Hansen 2011, 26). Baxter (2016) also investigated the application of sight translation skills to written translation combined with speech recognition; although there were no considerable time differences for the two studied groups, idiomaticity was enhanced, suggesting that combining sight translation with speech recognition "improves the spontaneity of the final text, thereby producing a more natural-sounding translation than the traditional W2W 10 method" (Baxter, 2016, 14). However, the most interdisciplinary approach was adopted in a study by Mees et al. (2013) where close collaboration among phoneticians, translators and interpreters yielded sound grounds for further interdisciplinary cooperation, proving that speech recognition technologies 11 can be successfully applied in translator training.
In Slovenia, no study has been carried out on having speech recognition technology fully integrated into translation work, focusing on a hybrid which "involves crossing borders between translation and interpreting since the translation is produced orally, as in interpreting, but is visible on the screen, as in translation" (Mees et al. 2013, 141). There is an introductory course on English phonetics and phonology for translators offered in year one of the undergraduate programme at the Department of Translation Studies in the University of Ljubljana to help students improve their pronunciation. As the advances in speech-to-text technology are relatively recent, students enrolled in the course may not be familiar with the relevance of pronunciation skills in technological applications, and may perceive pronunciation to be more important for interpreters than translators. Yet this issue is particularly relevant for those who may wish to use software which is heavily reliant on one's pronunciation. As Nuance is claiming a 99% accuracy for its software, it needs to be acknowledged that such accuracy is only possible if one's pronunciation is also highly accurate, otherwise the success rate of speech recognition is much lower. Near-native and intelligible pronunciation is required for the dictation systems to work well, at least for the time being, otherwise the rate of mistakes due to mispronunciation is too great to have such tools considered effective. However, so far the potential relevance of pronunciation skills for the trainee translators' work in the translation modules offered later as part of the graduate programme in Translation/Interpreting has not yet been addressed, as none of the specialised translation courses involve working with speech recognition technologies. As there are built-in dictation options available on computers (both for Windows and Mac users) that enable working with English, translation modules focusing on translation from L1 to L2 could possibly benefit 10 W2W means written to written translation. 11 For more details on speech recognition technology see Jurafsky and Martin (2000). from integrating this technology into their regular translation instruction. In their study, Mees et al. (2013) also report on working into L2 (cf. studies by Dragsted, Mees, and Hansen 2011). In Denmark, and the rest of Scandinavia, where, according to Phillipson (2003, 96) there are "good grounds for referring to English as a second language rather than a foreign language", working into L2 is not perceived as unusual. Both Danish as well as Slovene are comparable in this respect, as they can both be considered as languages of lesser diffusion, so L1 to L2 translation (cf. Pokorn 2005; Hirci 2012) is not uncommon in Slovenia either. In fact, children in Slovenia start learning English as part of their primary school curriculum at the age of six. Films and TV shows are regularly subtitled rather than dubbed, and Slovene translators work into both directions, L2 to L1 as well as L1 to L2. Many professional translators in Slovenia find themselves in a position where they are required to undertake translation into L2, English in particular, on a regular basis, since there is a serious shortage of native English speakers working with Slovene. Thus training is necessary in the L2 direction and is offered as part of the translator training curriculum at the Department of Translation Studies in the University of Ljubljana.

Future Prospects -More Work with Speech Recognition Systems?
So far, no research has been undertaken in Slovenia to explore working with speech recognition systems focusing on time efficiency in translation. However, a study was carried out on the possible benefits of applying speech recognition technologies in the pronunciation training of non-native speakers of English. Šuštaršič (2005, 87) investigated some software packages to explore their "usability within an English phonetics curriculum for EFL learners at the university level" that can be applied to pronunciation training. Šuštaršič (2005, 93-97) suggested that "speech recognition can be applied in phonetics (or more precisely, in pronunciation) teaching, and that a number of aspects of articulatory and auditory phonetic principles can be observed in the way that speech recognition programs transfer (or fail to transfer) the received speech signals into written form." He pointed out that "using any speech recognition program with English pronunciation students has several other justifications. Firstly, the program needs to be trained to one's voice, which requires a great deal of loud reading.
[…] The basic rule is: the more you train the program (i.e. the more you read), the higher will be the accuracy of recognition, and thus the usefulness of the program for any practical task." Šuštaršič (2005, 98) also suggested that students can be encouraged to record their own speech and apply a speech recognition programme to convert it into a written text, an idea which in itself is closely related to sight translation from Slovene into English. Šuštaršič (2005) reported working with commercial speech recognition technologies such as Via Voice and Dragon's NaturallySpeaking, which, however, are not freely available. A cost-free option nowadays is to simply activate the automatically built-in dictation option on the computer (either for Windows or Mac users), as it comes at no additional price, and explore its usability before obtaining some more sophisticated commercial software.
Drawing on Mees et al. (2013) and Šuštaršič (2005), a study was thus conceived to explore the possible benefits of using speech technologies in translator training for two reasons: • to improve trainee translators' pronunciation, • to use speech instead of typing to speed up the process of translation.

Study Design and Methodology
The present study was designed to explore the trainee translators' perceptions of the role of English pronunciation, as well as their familiarity with speech recognition tools, to establish whether or not it might be viable to introduce such technologies into translator training at the University of Ljubljana.

Methodology and Participants
An online questionnaire was designed for the purposes of the present study to foreground the perceptions of both undergraduate and graduate trainee translators studying at the Department of Translation Studies at the Faculty of Arts, University of Ljubljana, in the academic year of 2018/2019.

Data Collection
The questionnaire was made available online for 18 days, between 4 January 2019 and 22 January 2019, with a total of 94 participants taking part in the study. The questionnaire, designed using the online Google Forms survey mode, consists of 18 questions. The first part of the survey aims to collect general information about the participants, eliciting data on their age, gender and year of study. The second part of the questionnaire explores the participants' perceptions and self-awareness of their own pronunciation and their familiarity with the existing speech-to-text technologies that might prove to be useful in their future profession.
The trainee translators were asked to respond to several statements referring to their perceptions of the role pronunciation in English and their aspirations to improve it (i.e. a total of nine questions corresponding to yes/no answers, and four statements using a five-point Likert-type scale, ranging from "totally unmotivated" = 1 to "extremely motivated" = 5 related to the participants' motivation to have good pronunciation of English, from "the least important" = 1 to "the most important" = 5 on how important they find pronunciation in relation to other language skills, from "extremely poor" = 1 to "excellent" = 5 on how they would rate their own pronunciation at the time of filling out the questionnaire, and finally from "do not aspire to this at all" = 1 to "aspire to this 100%" = 5 on how much they aspire to have a near-native pronunciation of English).
Additional information on the existing speech recognition tools and the students' experience with the application of these technologies to their work was elicited using a number of multiple choice questions. The participants were also encouraged to provide additional comments on the possible benefits of using speech-to-text technologies in the final section of the questionnaire.

Results and Discussion
This section reports on the results of the questionnaire completed by the participants of the study. First, general demographic information on the participants is provided, followed by the data related to their pronunciation and awareness of speech recognition technologies. Due to the limited scope of this paper only those results that directly address the topic are discussed in detail.

General Information on the Participants
The study involved 94 participants, of whom all completed the questionnaire in full. All of the participants are either undergraduate BA students of Interlingual Mediation, or graduate MA students of Translation/Interpreting in the University of Ljubljana (cf. Figure 1). Of the 94 participants, 76 were female and 18 were male, and all were aged between 17 and 26 (average 21).
Most participants (41, i.e. 43.6%) are enrolled in year 1 of the BA in Interlingual Mediation, with 14 (14.9%) respondents from year 2 of the BA in Interlingual Mediation, and 16 (17%) respondents from year 3 of the BA in Interlingual Mediation (cf. Figure 1). At the graduate level, there were 15 (16%) participants from MA I in Translation, three (3.2%) from MA I in Interpreting, and five (5.3%) from MA II in Translation (there is no MA II in Interpreting available for this academic year).

Importance to speak English well
As evident from the results of the questionnaire, all of the participants believe that it is important to speak English well to make a good impression on their clients and employers, and all but one believe the same is important to be a successful interpreter, while 88 out of 94 participants (i.e. 93.6%) were of the opinion that this is also important for translators (cf. Hirci 2017). In addition, 90 (95.7%) respondents think that it is important to speak well to sound professional, and 83 (88.3%) to be able to use speech recognition tools more easily.

Significance of speaking English well
The participants seem to have rather diverse views on what speaking English well actually means. Most of the participants, i.e. 89 (94.7%), agreed that this meant having pronunciation which is intelligible and easy-to-understand, with 65 (69.1%) believing it meant speaking with an accent which is close to standard varieties of English. Fewer than half of the respondents in all (45 or 47.9%) believe that this meant having a native-like pronunciation.

Motivation to have a good pronunciation of English
The questionnaire yielded an insight into the participants' motivation with regard to having good pronunciation: the results show that over half of the participants (52 or 55.3%) are extremely motivated and an additional 30 (31.9%) are very motivated to have a good pronunciation of English (a mean score 12 of 4.4, cf. Table 1), which confirms that the respondents regard having good pronunciation in English as essential for their future profession. When asked about how important they find pronunciation compared to other language skills, the participants showed considerable agreement that pronunciation skills are quite important (a mean score of 3.8).

12
The central tendency for each Likert-type statement was summarised using the mean score. The participants' replies furthermore revealed that they tend to aspire to have English pronunciation which is intelligible yet close to one of the English standards. They deemed their own pronunciation at the time of filling out the questionnaire as only "good" or "fairly good", while only two participants considered it "excellent". Three participants even believed their pronunciation was "extremely poor" or "rather poor" (mean score 3.5). The responses revealed that over half (54.3%) of the participants have extremely high aspirations to improve their pronunciation, and an additional 29.8% of the participants have high aspirations to improve it (mean score 4.3).
These results are quite valuable, as they reveal that most participants are aware of the significance of having a good pronunciation of English. Whether they see a correlation with speech-to-text technologies, however, is yet to be explored. As clear, accurate and intelligible pronunciation is required to have speech recognition systems work well, at least for the time being, improving non-native English pronunciation is undoubtedly worth investing time and effort into if we also wish to gain from the advantages afforded by such technologies.

Specific Information on Speech Recognition Technologies
We wished to establish if the respondents were aware of the differences in speed as related to speech and typing. According to Nuance's Dragon speech recognition software, speaking is three times faster than typing. Most respondents of this study, i.e. 45 (47.9%), believed that speech was two times faster than typing, while 37 (39.4%) participants in fact responded that it was actually three times faster. Only two participants were of the opinion that speaking was slower than typing, three assumed that it was four times faster, while another four responded that these two activities were both of equal speed (cf. Figure 2, where responses are provided as option Other, after the option 4x faster).
It was no surprise to see that almost half of the participants (44.7%) responded that they have already used the built-in dictation software on their smartphones; nevertheless, the number is much lower for computers, where only 11 of participants out of 94 reported using this technology. Interestingly enough, 28 of the participants reported that their dictation was successful, or at least sometimes or to some extent. It is fair to assume that with more accurate pronunciation of English the perception of the success rate would most likely be even higher. Some participants also pointed out that they used the dictation option only on their smartphones, without ever realising that this was also possible on their computers.
In all, 70 (74.5%) of the participants responded that they would consider using dictation in their translation work; even more, i.e. 84 (89.4%) believed that it would be useful to work with speech recognition tools as part of their translator training at the university. In additional, individual comments, the participants provided a number of reasons why they assumed it would be useful to work with speech recognition tools as part of translator training (cf. Figure 3).
P14: "It could improve the student's pronunciation skills and, more importantly, the proper flow of speech." P8: "Speech recognition tools are great for improving ones pronunciation and I think we should focuse on that and phonetics in general more thoroughly." P4: "I believe that students should be familiar with any translation-or language-related technology. This can be useful in their careers." P53: "I think that such thing as a speech recognition tool would help me a lot with my poor pronunciation." P16: "Working with these tools would improve our pronounciation." P15: "I think we would be able to translate everything faster. And we would also practice our pronunciation and expand our vocabulary, because when we say something outloud, we remember it faster." P18: "So that we learn different approaches to translating and figure out for ourselves which best suits us. Also I think it is less time consuming than typing and prevents you from making spelling mistakes" P23: "It's a tool that is becoming increasingly popular and it could potentially make future work easier." P29: "Knowledge of new technologies is always useful, the more you know the more you can learn, new skills can easily improve our employability, variation of skills is important for adapting to the market" P19: "The more education we get -conected to our studies and technology connected to languages -the better." P20: "Speech recognition tools are developing and becoming a bigger part of our everyday life".
P42: "It would improve out studying and it would be a variation of "teaching" that is not often used." P49: "Because any aspect of the translation work that we are presented is welcome and useful. Anything that we learn might come in handy and we are better because of each of those experiences." P56: "So that we learn different techniques and figure out which approach best suits us." P59: "The advancement of technology will impose these tools sooner or later and it would be best if the new generations of translators and interpreters had mandatory training with them."  In addition, only three other online speech recognition tools were mentioned by the participants who were offered an option to list any other speech recognition technologies of which they might be aware: one of the participants noted using Google Keep, while another participant had not only heard of but has tried Voice Notepad for Slovene (they reported, however, that their dictation work was not 13 All comments by the participants are provided in their original form, verbatim, with spelling mistakes and other errors left unchanged. highly successful). It is interesting to note that 32 (i.e. 34%) of the participants responded that they have already tried using some of the tools mentioned, selecting mainly Apple Dictation, Google Docs Voice Typing and Windows Speech Recognition (only two participants selected Speechnotes, while only one mentioned IBM's Speech to text and another one Via Voice, cf. Figure 4). Most of the participants (62, or 66%) learnt about these speech tools online, by themselves, and only five (i.e. 5.3%) at the university.  Judging from the comments provided in the questionnaire, some participants are also aware of the drawbacks of the current speech recognition technologies and their reliability: As it can be observed from the participants' comments, the predominant idea revolved around the opinion that speech is "faster than typing", and that the application of speech technologies could make translators more efficient. Some students are well aware of the current situation in the ever-evolving digital world, recognizing that "the use of speech recognition today is growing and many people use it on their phones (Siri) or have devices (Alexa) that help them with everyday tasks." (P34) All this suggests it might be worth raising the awareness of the trainee translator population about the existence of such tools, and possibly even integrate speech recognition technologies into translator training. This could be achieved in several ways: either by implementing information on speech recognition technologies into the already existing technology-related courses, or by introducing it as part of a new course focusing on this particular topic with hands-on training within L1 to L2 translation modules.
Some studies have already shown (cf. Mees et al. 2013;Désilets et al. 2008) that the implementation of speech technologies into translation work is something that could possibly be better addressed in the future. There are also interesting pedagogical implications of this: if dictation may soon become an increasingly dominant mode of communication, it is important to gain an in-depth insight into the aspects of pronunciation that would be particularly relevant in translator training.

Conclusion
The present study explored the perceptions of trainee translators studying at the Department of Translation Studies in the University of Ljubljana on pronunciation and speech technologies. The results of the study offer good grounds for a more prominent role to be assigned to both pronunciation instruction and speech technologies in translator training. The study yielded results showing that an overwhelming majority of trainee translators (just under 90%) believe that having good pronunciation of English is important for their profession (cf. Hirci 2017), while over 80% also have aspirations to improve their pronunciation. In addition, the results show that all the participants believe it is important to speak English well to make a good impression on clients and employers; all but one find this important for interpreters, while 93.6% also find it important for translators. Moreover, 95.7% respondents stated that it is important to speak well to sound professional, and 88.3% believe this is important to be able to use speech recognition tools more easily.
These results suggest that equipping trainee translators with pronunciation skills for speech recognition technologies is of relevance and would most likely be embraced by the students. This is in line with the study by Mees et al. (2013, 149), whose retrospective interviews revealed that "a number of students feel that they have become more aware of their pronunciation problems in the course of training the SR [speech recognition] program". Their study also revealed that speech recognition "provides a potentially useful supplement to written translation, or indeed an alternative to it" (Mees et al. 2013, 140-42). The immediate time-efficiency aspect is therefore yet another reason why speech recognition technologies could be applied in translator training: a new modality could also enhance the learning experience in the translation classroom. As some participants of this study have observed, "Time is valuable. Every second saved from sitting in front of a screen and keyboard is warmly welcome" (P41) or "It is faster, so they can earn more money in a shorter period of time and thus have more free time. :) (P68)." With the increasingly rapid advances in voice activated technologies, translator trainers should seize the opportunity to retain tech-savvy students' interest and channel it into their regular coursework. Staying ahead is vital to remaining competitive; having that special 'edge' might be a deciding factor in having trainee translators turn into successful players on the professional translation market. Thus aiming to have good pronunciation and speak English well enough to be able to work with speech recognition technologies could prove to have added value for translators' professional careers.