Show Summary Details

Page of

PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com). (c) Oxford University Press, 2015. All Rights Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in Oxford Handbooks Online for personal use (for details see Privacy Policy).

Subscriber: null; date: 18 December 2017

Translation Technology

Abstract and Keywords

In today’s market, the use of technology by translators is no longer a luxury but a necessity if they are to meet rising market demands for the quick delivery of high-quality texts in many languages. This chapter describes a selection of computer-aided translation tools, resources, and applications, most commonly employed by translators to help them increase productivity while maintaining high quality in their work. This chapter also considers some of the ways in which translation technology has influenced the practice and the product of translation, as well as translators’ professional competence and their preferences with regard to tools and resources.

Keywords: translation technology, computer-aided translation tools, electronic translation resources, electronic translation applications

35.1 Introduction

It did not take long following the development of the first computers for researchers to turn their attention to applying these computers to natural language processing tasks. In the period immediately following World War II, initial attempts were made to develop fully automatic, high-quality machine translation systems intended to replace translators. However, researchers soon came to appreciate that translation is a highly complex task that consists of more than mere word-for-word substitution. It proved very challenging to programme computers to take into account contextual, pragmatic, and real-world information. Consequently, while research into machine translation is still ongoing (see Chapter 34 of this volume), researchers and developers have broadened the scope of natural language processing applications to include computer-aided translation (CAT) tools, which aim to assist, rather than replace, professional translators (Austermühl 2001; Bowker 2002; Quah 2006; L’Homme 2008; O’Hagan 2009; Kenny 2011; Zetzsche 2017).

Increased interest in CAT has been needs-driven—on the part of both clients and translators—as recent decades have witnessed considerable changes in our society in general, and in the translation market in particular. Most texts are now produced in a digital format, which means they can be processed by computer tools. Largely as a result of globalization, there has been a significant increase in the volume of text that needs to be translated into a wide variety of languages. In addition, new types of texts, such as web pages, have appeared and require translation. Furthermore, because companies want to get their products onto the shelves in all corners of the world as quickly as possible, and because electronic documents such as web pages often contain content that needs to be updated frequently, deadlines for completing translation jobs seem to be growing ever shorter.

The demands of our fast-paced, globalized knowledge society have left translators struggling to keep pace with the increasing number of requests for high-quality translation into many languages on short deadlines. However, these two demands of high quality and fast turnaround are frequently at odds with one another. Therefore, one way that some translators are trying to balance the need for high quality with the need for increased productivity is by turning to electronic tools and resources for assistance (see section 35.5).

A wide range of electronic tools and resources are of interest to translators to help them carry out various translation-related tasks; however, CAT tools are typically considered to be those designed specifically with the translation task proper in mind, rather than tools intended for general applications (e.g. word processors, spelling checkers, e-mail, workflow and project management). Tools commonly considered to fall under the CAT umbrella include translation memory systems, terminology management systems, term extractors, concordancers, localization tools, and even machine translation systems—all of which have been observed in use in recent workplace studies involving professional translators (e.g. LeBlanc 2013). Indeed, combinations of some or all these tools are sometimes bundled into a tool suite, which is increasingly referred to as a Translation Environment Tool (TEnT). Some individual tools are more automated than others, and it is helpful to consider CAT as part of a continuum of translation possibilities, where various degrees of machine assistance or human intervention are possible. This chapter will focus on tools and resources that support the translator during the translation process, whereas machine translation is covered in detail in Chapter 34 of this volume.

The rest of this chapter is divided into two main parts which are organized as follows. The first part focuses on translation tools, beginning with an overview of TEnT tools, followed by a more detailed look at some of their principal components. The first of these is the translation memory system, which is the core piece around which the other TEnT components are built. Other tools in the TEnT suite that are described include terminology management systems and term extractors, bilingual concordancers, quality assurance checkers, and project management and translation workflow tools. Next, there is a brief description of localization tools, which are used by translators to adapt websites, software, and videogames from one language and culture to another. Following this discussion of translation tools, the focus shifts to an examination of web-based resources and applications for translators: from general-reference electronic resources, search engines, portals, directories, and the like to more sophisticated look-up tools, cross-language and multilingual information retrieval systems, web-searchable and web corpora. The closing section offers a cursory look at translators’ technology habits, awareness, and IT competence.

35.2 Translation Environment Tools

As noted above, it is becoming increasingly common to find a range of CAT tools integrated into a tool suite or TEnT. The term TEnT has appeared in the translation technology literature, along with other competing terms, since the tool was first conceived. However, recently, TEnT has been very strongly championed by Jost Zetzsche (2006). Other terms for a TEnT that commonly appear in the literature are translation workstation (Melby et al. 1980), translator’s workstation (Hutchins 1998), or translator’s workbench (popularized by Trados, one of the largest TEnT distributors that has since merged with SDL to form a new company known as SDL Trados). A TEnT allows its various components to interact or to use the output of one tool as the input for another (Somers 2003a). In fact, TEnTs are the most popular and widely marketed translation tools in use today. While the individual components differ from product to product, some of the elements that are frequently found in a TEnT, along with a description of their basic functions, are summarized in Table 35.1.

Table 35.1. Some common TEnT components

TEnT Component

Brief description

Active terminology recognition

Scans a new source text, consults a specified termbase, and automatically suggests and/or replaces any terms in the text with their target-language equivalents from the termbase.

Bitext aligner

Segments original source and target texts into sentence-like units and matches up the corresponding segments to create an aligned pair of texts known as a bitext, which is a parallel corpus that forms the basis of a translation memory database.

Concordancer

Searches a (bi)text for all occurrences of a user-specified character string and displays these in context.

Document analysis module

Compares a new text to translate with the contents of a specified translation memory database or termbase to determine the number/type of matches, allowing users to make decisions about pricing, deadlines, and which translation memory databases to consult.

Machine translation system

Generates a machine translation for a given segment when no match is found in the translation memory database.

Project management module

Helps users to track client information, manage deadlines, and maintain project files for each translation job.

Quality control module

May include spelling, grammar, completeness, or terminology-controlled language-compliance checkers.

Term extractor

Analyses (bi)texts and automatically identifies candidate terms.

Terminology management system

Aids in the storage and retrieval of terms and terminological information. The contents form a termbase, which is often used during translation in conjunction with translation memory systems.

Translation memory system

Searches an aligned database to allow a translator to reuse previously translated material.

As noted above, not every TEnT contains all possible components, but there are many TEnTs available on the market today, and users can undoubtedly find one that meets their needs from options such as Across, DéjàVu, Google Translator Toolkit, Heartsome, JiveFusion, memoQ, MetaTexis, MultiTrans, OmegaT, SDL Trados Studio, Similis, Star Transit, or WordFast Pro, among others.

35.2.1 Translation Memory Systems

While the components of specific TEnTs differ, the main module around which all TEnTs are constructed is a translation memory (TM) system. This component, which is the tool most widely used by individual translators, translation agencies, and other organizations involved in translation (Garcia 2007; O’Hagan 2009; Christensen and Schjoldager 2010), usually functions in close association with a terminology management system, which can help translators to optimize TM use (Gómez Palou 2012). Currently, there is a plethora of TM systems to choose from on the market, and these come with a variety of characteristics, features, and price tags. These range from well-established, segment-based systems (e.g. SDL Trados), through those taking a bitext-based approach (e.g. MultiTrans), to very streamlined systems (e.g. WordFast Classic), open source systems (e.g. OmegaT), and even cloud-based systems (e.g. Lingotek Translation Workbench).

A TM system allows translators to store previously translated texts in a database and then easily consult them for potential reuse (Bowker 2002; Somers 2003b; Doherty 2016). Note that when a TM system is first acquired, its database will be empty. The TM database can be populated either by importing legacy documents (i.e. previously translated texts and their corresponding source texts), or by simply beginning to translate. With the latter approach, each sentence that the translator translates can be added to the TM database; however, it may be some time before the database becomes large enough to regularly return useful matches.

To facilitate the retrieval of information from the database, the source and target texts must be aligned. In conventional segment-based TM systems, the database is created by first dividing the texts into segments. Segments are typically complete sentences, but they may also be other sentence-like units, such as document headings, list items, or the contents of a single cell in a table. Next, in a process called alignment, the aligner tool associated with the TM system must link each segment from the source text to its corresponding segment in the target text and then store these pairs—known as translation units—in a TM database. This is the approach used by such well-known TM systems as SDL Trados Translator’s Workbench. Other systems, such as Multicorpora’s MultiTrans, use an aligned bitext approach. In this case, rather than storing each pair separately, the source and target texts are preserved and aligned in their entirety, allowing all segments to be viewed in their larger context. However, it must be noted that many segment-based TMs have since introduced mechanisms that allow them to preserve the order of the pairs of translation units stored in their databases, thereby making it possible to reconstruct the original context on-the-fly (Benito 2009). Having access to previous and following segments—whether in a segment-based or a bitext-based system—is a necessity if the TM system is to be able to identify in-context matches (see Table 35.2). Automatic alignment may pose numerous challenges because translators do not always present information in the target text in the same order in which it was presented in the source text. Similarly, they may split or conflate sentences as they are translating, so that the source and target text do not match up in a one-to-one way at the sentence level (Bowker 2002).

Once the aligned TM databases have been created, they can be consulted by the TM system to determine if any of their contents can be reused. Even before the translation process begins, a document analysis module can compare the new source text against the contents of the TM database and its associated termbase in order to calculate how many matches will be found, and what types of matches these will be (see Table 35.2). This information can be useful for both clients and translators. For example, it can help a translator to plan how much time will be needed for the project, or which TM databases or termbases can usefully be consulted. Meanwhile a client might use the information to help predict how much the job will cost, or what deadline might be reasonable.

Before the translator begins translating, the source text must be imported into the TM environment. Some TM systems rely on third-party text editors (e.g. MS Word) to allow translators to process the texts (e.g. JiveFusion, MultiTrans), others have a proprietary text editor built into their interface (e.g. Déjà Vu, OmegaT), while still others let the translator work in a browser-based environment (e.g. Lingotek). Note that once translated, the target texts can be exported into any format supported by the tool (e.g. .doc, .rtf). The TM databases themselves can also be exported and imported, and the most popular method for doing so is to use the Translation Memory eXchange (TMX) file format, which is an open XML standard that has become widely adopted by the translation and localization industries for exchanging TM data between different tools. Similarly, the XML-based Term Base eXchange (TBX) file format can be used to exchange data between the terminology tools that are associated with most TM systems (Savourel 2007).

After the source text has been imported, the next step is to see if there are any matches in the TM database. First-generation TM systems attempt to find matches at the segment level. The TM system divides the source text into segments and then compares each of these against the contents of the TM database. Using a purely form-based pattern-matching technique, the TM system determines whether the segment contained in the new text has been previously translated as part of a text that is stored in the TM database. Early first-generation systems were able to identify exact matches and fuzzy matches at the segment level (see Table 35.2).

Certain types of texts do indeed contain a high number of matches at the segment level, such as a text which is a revision of a previous document, or documentation for a new product which differs only slightly from a previous model. Nevertheless, it soon became apparent that a greater number of matches could be made in a wider variety of text types if the units of comparison were smaller than a complete segment. Second-generation TM systems, such as Similis, therefore took the next logical step. Instead of seeking exact or fuzzy matches at the level of a complete segment, these tools seek matches at the sub-segment or ‘chunk’ level (Colominas 2008). However, while alignment at the segment level may be challenging, alignment at the sub-segment level is even more so because translation does not simply consist of word-for-word substitution. Nevertheless, research in this area is advancing rapidly, and some of the best-performing machine translation systems today are the statistical and example-based systems that assemble translations from different sub-sentential units (e.g. see Koehn 2010; Chapter 34 in this volume).

A drawback associated with sub-segment look-up in a TM system is that translators may be presented with a high number of translation suggestions for basic vocabulary words or fuzzy chunk matches, which could prove to be more distracting than helpful. As suggested by Macken (2009), in order for sub-segment matches to be more useful, improvements need to be made at the level of word alignment to improve both precision and recall. Ideally, the matching mechanism would also be able to take morphological variants into account.

This has inspired other researchers to turn their attention to developing techniques that analyse segments not only with regard to syntax, but also semantics, in what may be termed third-generation TM systems. Pekar and Mitkov (2007), Marsye (2011), Gupta and Orasan (2014), Timonera and Mitkov (2015), Gupta et al. (2016a) and Gupta et al. (2016b) are among those who are developing methods for finding matches that may be semantically equivalent, even if they present syntactic differences resulting from linguistic phenomena such as inflection, compounding, passive constructions, clause embedding or paraphrase.

Regardless of the method used, when the TM system finds matches in the TM database for a given segment, these are presented to the translator, allowing him or her to see how that segment was previously translated and to decide whether that previous translation can be usefully integrated into the new translation (see Figure 35.1). In keeping with the CAT philosophy, where tools are designed to assist rather than to replace the translator, a translator is never obliged by the system to accept the matches identified by the TM system; these are offered only for consideration and can be accepted, modified, or rejected as the translator desires. While there are no system-imposed obligations to accept matches, some clients or employers may have guidelines that do require translators to accept certain types of matches (LeBlanc 2013). Moreover, if no match is found for a given segment, the translator will have to translate it from scratch or send it to an integrated machine translation system to produce an initial draft, which may then be post-edited before being stored in the TM database for future reuse (Garcia 2009; Joscelyne and van der Meer 2007; Reinke 2013).

Table 35.2. Types of matches commonly displayed in TMs

Exact match (100% match)

A segment from the new source text is identical in every way to a segment stored in the TM database.

In-context exact match (ICE or 101% match)

An exact match for which the preceding and following segments that are stored in the TM database are also exact matches.

Full match

A segment from the new source text is identical to a segment stored in the TM database save for proper nouns, dates, figures, formatting, etc.

Fuzzy match

A segment from the new source text has some degree of similarity to a segment stored in the TM database. Fuzzy matches can range from 1% to 99%, and the threshold can be set by the user. Typically, the higher the match percentage, the more useful the match; many systems have default thresholds between 60% and 70%.

Sub-segment match

A contiguous chunk of text within a segment of the new source text is identical to a chunk stored in the TM database.

Term match

A term found in the new source text corresponds to an entry in the termbase of a TM system’s integrated TMS.

No match

No part of a segment from the new source text matches the contents of the TM database or termbase. The translator must start from scratch or call on an integrated machine translation to propose a solution.

Figure 35.1. An example of a 75% fuzzy match retrieved from a TM database.

New source text segment to translate

Click OK to display changes.

Fuzzy match and corresponding translation retrieved from TM database

  • EN: Click OK to display messages.

  • FR: Cliquez sur OK pour afficher les messages.

As noted above, increased productivity and improved quality are some of the regularly acknowledged benefits of adopting TM systems (LeBlanc 2013). Nevertheless, the introduction of a new tool will almost certainly impact the existing workflow and can affect—both positively and negatively—the translation process and product (Doherty 2016).

Given the way TMs operate, any gain in efficiency depends on the TM’s ability to return matches. Texts that are internally repetitive or that are similar to others that have already been translated (e.g. revisions, updates, and texts from specialized fields) will tend to generate useful matches. Texts that are less ‘predictable’ (e.g. marketing material) will not. Nevertheless, in over 300 hours of observing translators in the workplace, LeBlanc (2013) notes that TMs are used for nearly all texts, no matter the type (general/administrative, technical, specialized) or the subject field. This practice was applied even when the text in question was not particularly well-suited to TM use, and the result in many cases was that the TMs retrieved very little reusable text and in some cases nothing at all.

If matches are found, simply being able to automatically copy and paste desired items from the TM database or termbase directly into the target text can save translators typing time while reducing the potential for typographic errors. However, significant gains in productivity are usually realized in the medium to long term, rather than in the short term, because the introduction of CAT tools entails a learning curve during which productivity could decline. Moreover, a number of independent translators find these tools so challenging to use that they simply give up on them before realizing any such gains (Lagoudaki 2006). In addition, with so many competing products on the market, translators may find that different clients or different jobs require the use of different tools, which adds to the learning curve. Sharing across different products is becoming easier as standards such as Translation Memory eXchange (TMX), TermBase eXchange (TBX) and XML Localization Interchange File Format (XLIFF) are becoming more widely adopted (Savourel 2007: 37). In addition, cloud-based access to TM databases is also becoming an increasingly common model (Garcia 2007; Gambín 2014).

With regard to quality, CAT still depends on human translators. If a client has an existing TM database, and the client specifies that this database must be used for the translation project, then the translator has no control over its contents. If not properly maintained, TM databases can easily become polluted and can in fact propagate errors rather than contributing to a higher-quality product (LeBlanc 2013). Furthermore, the segment-by-segment processing approach underlying most TM tools means that the notion of ‘text’ is sometimes lost (Bowker 2006; LeBlanc 2013). For example, translators may be tempted to stay close to the structure of the source text, neglecting to logically split or join sentences in the translation. To maximize the potential for repetition, they may avoid using synonyms or pronouns. Moreover, in cases where multiple translators have contributed to a collective TM database, the individual segments may bear the differing styles of their authors, and when brought together in a single text, the result may be a stylistic patchwork. Although such strategies may increase the number of matches generated by a TM, they risk detracting from the overall readability of the resulting target text. In addition, translators sometimes feel stifled by having to adhere to the sentence-by-sentence mould. In contrast, however, LeBlanc (2013) observes that for some texts, translators are relieved that TMs can sometimes be extremely useful in eliminating certain types of tedious and repetitive work.

Some forms of translation technology may also affect the professional status of translators, their remuneration, and their intellectual property rights. For instance, some clients ascribe less value to the work of the translator who uses a TEnT, suggesting that if working with such tools is faster and easier than unaided human translation, then they wish to pay less for it. In response, some translators working with such technologies are developing new payment models (e.g. a volume-based tiered-pricing model) (Joscelyne and van der Meer 2007). Another trend beginning to appear is an increased commoditization and sharing of resources such as TM databases and termbases, which raises ethical questions regarding the ownership of such resources (Gow 2007). But beyond issues of payment, translators sometimes feel that their professional autonomy is being affronted when they are required to reuse an exact match from the TM database as it is, even when they feel that it is not suitable to the larger context, and they see this business-driven practice as a major step in the wrong direction (LeBlanc 2013). The loss of autonomy, coupled with the potential deskilling that some translators feel comes as part and parcel of an overreliance on technology and of being left out of the decision-making around issues of productivity and quality, has in some cases led to a feeling of de-professionalization and reduced job satisfaction. However, when translators feel more involved in the development of technologies (Koskinen and Ruokonen 2017), or when the use of tools is not tightly bound to productivity requirements (LeBlanc 2017), then such professional concerns are less pronounced and translators tend to have more positive feelings toward the use of tools. In the same line, crowdsourcing and collaborative translation are bringing important changes to a translation technologies scenario where non-professionals are beginning to use and share TM and MT systems for non-profit, unpaid work (Garcia 2009; Jiménez-Crespo 2017).

Translators may begin to see themselves in a different light after working with TM tools. For instance, LeBlanc (2013) reports that, in interviews with over fifty translators, more than half of them reported feeling as though TM use was having an effect on their natural reflexes, leading to a sort of dulling or erosion of their skills. Some reported having less trust in their instincts, confessing that they sought to avoid translating from scratch and preferred the ‘collage’ method of building a solution around various sub-segment matches found in the TM database. However, others admitted that they were at times relieved when the TM had nothing to offer as this allowed them to translate more freely, in some respects. Others wondered if translation would continue to be a profession to which creative types would be attracted.

Finally, novice translators may need to be extra careful when it comes to TM use. On the one hand, translators interviewed as part of LeBlanc’s (2013) workplace study touted the pedagogical potential of TMs, which could open up a whole array of possibilities in that they become a tool that allows translators to benefit and learn from one another’s insights. On the other hand, novice translators may rely too heavily on TMs, treating them as a crutch that gets in the way of offering their own solutions. Interestingly, this observation was made not only by senior translators and revisers, but also by beginner translators themselves, who suggested that TM use should be limited in the early years on the job. This would allow novice translators to gain a better understanding of the complexity of the translation process and to familiarize themselves with other tools that are available to them, as well as to develop the critical judgement required to effectively assess the suitability of the TM’s proposals (Bowker 2005).

35.2.2 Terminology Tools

While TM systems are at the core of TEnTs, they are almost always fully integrated with terminology tools, which can further enhance their functionality (Steurs, De Wachter and De Malsche 2015). A terminology management system (TMS) is a tool that is used to store terminological information in and retrieve it from a terminology database or termbase. Translators can customize term records with various fields (e.g. term, equivalent, definition, context, source), and they can fill these in and consult them at will in a stand-alone fashion. Retrieval of terms is possible through various search types (e.g. exact, fuzzy, wildcard, context) (Bowker 2003).

When used as a stand-alone tool, a TMS is very similar to a termbank in that it is essentially a searchable repository for recording the results of terminological research. However, termbases can also be integrated with TM systems and work in a more automated way. For instance, many TMSs have an active terminology recognition or automatic term look-up feature, which interacts directly with the word processor. The contents of the new source text that the translator has to translate are automatically scanned and compared against those of a specified termbase. Whenever a match is identified, the translator is alerted that there is an entry for that term in the termbase. The translator can then consult the termbase entry and, if desired, paste the equivalent directly into the text with a single click. In fact, some TM systems and TMSs go one step further and offer a function known as pre-translation (Wallis 2008). If pre-translation is activated, the equivalents for all matches are automatically pasted into the text as part of a batch process. The advantages of using this type of integrated TMS are that translators can work more quickly and can ensure that terminology is used consistently.

Interestingly, however, translators are beginning to learn that they can optimize the efficiency of a TM system if they modify the way that they record terminology in their termbases (Bowker 2011; Gómez Palou 2012). For instance, it is possible to get a greater number of matches if translators record any frequently occurring expression, even if it does not technically qualify as a specialized term. Similarly, instead of recording only the canonical form of a term, translators can record all frequently used forms (e.g. conjugated forms of a verb). Translators can also mine the contents of the TM databases to feed the termbase. These practices—recording non-terms, recording non-canonical forms, and consulting translated sources—were all discouraged in the days before TMSs were integrated with TM systems; however, in order to maximize the benefits that can be gained by using these technologies together, translators are beginning to change their practices.

To effectively build up the contents of a termbase, another type of terminology tool that can be integrated into a TEnT is a term extractor or term extraction system (see Chapter 40). A term extractor is a tool that attempts to automatically identify all the potential terms in a corpus—such as the bitext or parallel corpus that makes up a TM database—and then presents this list of candidates for verification. While the lists of candidates generated by term extractors are not perfect—there will almost certainly be instances of both noise (non-pertinent items identified) and silence (relevant patterns missed)—they nonetheless provide a useful start for building up a termbase for any translator having to identify terms in a large document or series of texts.

Term extractors can use any of several different underlying approaches (Cabré Castellví, Estopà Bagot, and Vivaldi Palatresi 2001; Lemay, L’Homme, and Drouin 2005; Heylen and De Hertog 2015). Frequency- and recurrence-based techniques essentially look for repeated sequences of lexical items. The frequency threshold, which refers to the number of times that a sequence must be repeated, can often be specified by the translator. Pattern-based techniques make use of part-of-speech-tagged corpora to search for predefined combinations of grammatical categories (e.g. adjective + noun) that typically correspond to term formation patterns. Meanwhile, corpus comparison techniques compare the relative frequency of a given lexical pattern in a small specialized corpus and a larger general reference corpus to determine the likelihood that the pattern corresponds to a term. Moreover, these various approaches can be combined in hybrid term extraction systems.

35.2.3 Bilingual Concordancers

When no useful results can be found in a termbase or TM database, search tools such as bilingual concordancers—which can operate as stand-alone tools, but which are also now regularly integrated within a TM system—can prove to be extremely helpful for translators who need to conduct terminology research (LeBlanc 2013; Maia 2003). Less automated than a term extractor or TM system, a bilingual concordancer allows translators to search through aligned bilingual parallel corpora (including TM databases) to find information that might help them to complete a new translation (see also section 35.4.3). For example, if a translator encounters a word or expression that he or she does not know how to translate, he or she can search in a bilingual parallel corpus to see if this expression has been used before, and if so, how it was dealt with in translation. By entering a search string in one language, the translator can retrieve all examples of that string from the corpus. As shown in Figure 35.2, the search term ‘stem cell’ has been entered and all the segments in the English portion of the corpus that contain the string ‘stem cell’ are displayed on the left, while the corresponding text segments from the French side of the aligned parallel corpus are shown on the right.

Figure 35.2. Results for a search on the string ‘stem cell’ using a bilingual concordancer and a parallel corpus.

Stem cell research, though advancing quickly, is still at a very innovative stage.

Malgré ses progrès rapides, la recherche sur les cellules souches est encore à un stade très novateur.

However, no adult stem cell has been definitively shown to be completely pluripotent.

Toutefois, on n’a pu démontrer de façon définitive que les cellules souches adultes pouvaient être complètement pluripotentes.

Prior treatment also included cytotoxic chemotherapy, interferon, or a stem cell transplant.

Certains des patients avaient déjà subi une chimiothérapie cytotoxique, un traitement par l’interféron ou une greffe de cellules souches.

Although some bilingual concordancers that operate outside a TM environment require translators to independently compile and align their own parallel corpus, online bilingual concordancing tools are now available which search parallel websites (see section 35.4.2), thus alleviating the burden of corpus construction (Désilets et al. 2008). However, translators must clearly use their professional judgement when evaluating the suitability of the results returned by these tools.

35.2.4 Quality Assurance Checkers

Another type of tool that is becoming increasingly available as part of a TEnT, and which works in conjunction with TM and TMS systems, is a quality assurance checker. This tool compares the segments of source and target texts to detect translation errors such as inconsistent or incorrect term use (when compared to a specified glossary); omitted (empty) segments; untranslated segments (where source and target segments are identical); incorrect punctuation or case; formatting errors; incorrect numbers, tags or untranslatables. Some tools allow the quality checks to be carried out in real time as a translator is working, while others must be applied once the translation is completed.

While they are very helpful as a complementary means of assuring quality control, these tools do have some limitations, which users must bear in mind. For example, they cannot detect problems associated with a translator’s incorrect or incomplete understanding of a source text. When checking terminology for correctness or consistency, the tool is limited by the contents of the glossary. Moreover, they work on the assumption that all inconsistencies are undesirable, whereas a translator may have deliberately introduced synonymy or paraphrasing to some part of the text in order to improve its readability or style. In addition, they expect the source text to be correct, which is not always the case, and they flag ‘errors’ in the target text accordingly. Similarly, because they do not understand that source and target languages may have different rules for punctuation or capitalization, they may detect false errors. To overcome this, some quality assurance tools do offer different settings for different language pairs.

These quality checkers are not intelligent and they are not meant to obviate the need for careful proofreading or editing. However, in spite of their limitations, these tools can still be useful, even to experienced translators (Gerasimov 2007). Being able to go directly to the place in the text where error is located facilitates rapid correction, and eliminating simple errors early in the process saves time at the proofreading stage.

35.2.5 Project Management and Workflow Tools

While project management and translation workflow tools do not help translators with the actual task of translating, they can be useful for helping to manage translation projects, particularly in cases where the project is large and has multiple team members. For example, these tools can be used to help manage and track the assignment of tasks (e.g. translation, revision, proofreading) and deadlines, and to indicate which specific resources (e.g. TM databases, termbases) should be consulted for a given job. They can also help with other administrative tasks, such as managing client information or invoicing. The papers in the volume edited by Dunne and Dunne (2011) provide good coverage of a range of issues relating to project management in translation and localization contexts, including the effective selection and application of project management and workflow tools.

35.3 Localization Tools

Localization is the process of adapting the content of a website, software package, or videogame to a different language, culture, and geographic region. Translation is one part of this process, which may also include technical and visual (e.g. image, colour, or layout) adaptations. To be localized, digital material requires tools and technologies, skills, processes, and standards that are different from or go beyond those required for the adaptation of traditional (e.g. print-based) materials. For example, while a printed text is intended to be read in a linear fashion, layout and placement of text on a website takes on greater importance. Shortcut keys used in software often have a mnemonic value, such as Ctrl-p for ‘print’, and these may need to be adjusted if they are to be meaningful in another language (e.g. in French, the equivalent for ‘print’ is ‘imprimer’, so the mnemonic value of the letter ‘p’ would be lost). Sometimes physical adjustments need to be made, such as to the width of a menu or the size of a button. For instance, a button that is large enough to contain the English word ‘Save’ would need to be resized to accommodate the French equivalent ‘Sauvegarder’. In videogame localization, the main priority is to preserve the gameplay experience for the target players, keeping the ‘look and feel’ of the original. Localizers are given the liberty of including new cultural references, jokes, or any other element they deem necessary to preserve the game experience and to produce a fresh and engaging translation. This type of creative licence granted to game localizers would be the exception rather than the rule in other types of translation (O’Hagan and Mangiron 2013).

To deal with these myriad elements, localization is typically carried out by a team of participants, including a project manager, software engineers, product testers and translators. Esselink (2000) and Dunne (2006) provide good overviews of the general software localization process and the tasks and players involved, while Jiménez-Crespo (2013) explores the intricacies of website localization.

While localization tools themselves are not typically components of TEnTs, it is important to note that many localization tools share a number of the same components as TEnTs, including TM systems and TMSs. This section describes some of the additional features offered by localization tools, with a specific focus on those which are most pertinent to the task of translation proper. Some of the tools currently available on the market today include Passolo and Catalyst (for software localization), CatsCradle and WebBudget (for website localization), and LocDirect (for videogame localization).

Like TEnTs, localization tools group a number of important localizing functions for ease of use. For example, a localization tool will integrate TM and TMS functions into the resource editor or editing environment, and it will provide protection to software elements that should not be changed (i.e. source code). In a software file, translatable text strings (e.g. on-screen messages) are surrounded by non-translatable source code (see Figure 35.3). Localization tools need to extract these translatable strings, provide an interface for translating the strings, and then reinsert the translations correctly back into the surrounding code. Moreover, the translated strings need to be approximately the same length as the original text because the translations have to fit into the appropriate spaces of dialogue windows, menus, buttons, etc. If a size-equivalent translation is not possible, the localization tool must offer resizing options.

Figure 35.3. Translatable text strings (shown here underlined) embedded in non-translatable computer code.

IDD_DIALOG_GROUPEDIT DIALOG DISCARDABLE 0, 0, 309, 106

STYLE DS MODALFRAME | WS_POPU- | WS_CAPTION | WS_SYSMENU

CAPTION ‘XML Group Element

FONT 8, MS Sans Serif

BEGIN

LTEXT

Element &Name’, IDC_STATIC, 7, 14, 49, 8

EDITTEXT

IDC XML_GROUP_ELEMENT, 77, 12, 140, 14, ES_AUTOHSCROLL

GROUPBOX

&Group Action’, IDC_STATIC, 13, 55, 202, 44

CONTROL

Create new &resource’, IDC_RADIO_NEWRES, ‘Button’, BS_AUTORADIOBUTTON, 20, 68, 84, 10

PUSHBUTTON

Cancel’, IDCANCEL, 252, 24, 50, 14

END

While visual localization environments—which allow the translators to translate strings in context and to see the positioning of these translated strings in relation to other strings, controls, and dialogue boxes on the screen—are available for some computing environments and platforms, this is not the case across the board. Therefore, translators working on localization projects frequently have to translate (sub-)strings out of context. These strings are later assembled at runtime, to create the messages that are presented to the user on the screen. However, what may have seemed like a reasonable translation in the absence of a larger context may not work well once it is placed in a larger string. Concatenation at runtime can cause significant problems in the localized digital content and requires careful checking and linguistic quality assurance (Schäler 2010).

35.4 Web-based Resources and Applications

Translation technologies have proven to be indispensable for professional translators. Using TM systems and other CAT tools enhances the efficiency and cost-effectiveness of translation and multilingual document management. However, such automated resources for translation should not be regarded as a panacea. While TM systems have had an unprecedented impact on the translation industry, they are particularly suitable for highly repetitive texts from a narrow domain (e.g. operating manuals and instructions for use) and for texts that are frequently updated with little change. They do not perform well for more creative, less predictable genres. For a wide range of specialized domains, bitexts (parallel corpora) may be either inexistent or difficult to obtain. The growth of TM files cannot catch up with the growth of bilingual corpora or bilingual websites, nor can they keep up with being representative of dynamically developing domains where new terminology is being proposed on a daily basis (Corpas Pastor 2007).

Secondly, in situations where CAT tools could really provide a smooth and problem-free solution, they are not always of assistance to translators. We have already mentioned that introducing TM systems is a technically challenging procedure with a steep learning curve which could hinder productivity at early stages. But even when translators have managed to overcome this initial drawback, quite often they are unable to retrieve the relevant segments or find themselves left to their own devices. By way of illustration it would suffice to mention that exact repetitions are not necessarily more useful than inexact examples; that TM databases are sometimes populated with too many redundant examples or with confusing, rare, and untypical translations; and that segment fuzzy matching could lead to extracting too many irrelevant examples (noise) or to missing too many potential useful examples (silence) (Hutchins 2005). Besides, as the fuzzy matching technique is based on the degree of formal similarity (number of characters), and not on content similarity, it is more difficult for TM tools to retrieve segments in the presence of morphological and syntactic variance or in the case of semantic equivalence but syntactic difference (Pekar and Mitkov 2007; Timonera and Mitkov 2015; Gupta et al. 2016; see also section 35.2.1 above), e.g. inflection, compounding, passive constructions, clauses, paraphrases, etc. Another clear example is when the TM system cannot generate useful matches for a given source language (SL) segment. As said before, in that case translators have to translate such segments from scratch, get an initial draft from an integrated MT system, use TMS for active terminology recognition and pre-translation, or else, resort to term extractors, termbases, and other terminology tools to assist in the process.

The problems highlighted so far pertain to the TM technology itself. TM systems usually operate at the sentence level to find potential translation units within the aligned bitexts. There seems to be a gap, then, in the matching choices available to translators, as they are either presented with whole segments or just terminology equivalents, but not generally with sub-segments (with some notable exceptions, for example, the ‘assemble’ function of Déjà Vu). The application of TM technology only at the segment level has led to a stagnation of the commercial research and, consequently, to very little improvement on the reuse of TM databases of previously translated texts. The main advances seem to be restricted to the complementary features offered by TM systems (Benito 2009), instead of improving and expanding the use of TM technology by focusing on sub-segment-level matching, pattern-based translation, or semantic similarity computation, in line with the architecture of most EBMT systems (see Chapter 34).

There are various reasons for this situation. Technically speaking, identifying segments which are exact or fuzzy matches seems to be the easiest way to build TM databases from past translations. And from a commercial point of view, calculating translation units at the segment level is also a straightforward way of pricing translations produced with a TM system in place. Nowadays it is common practice to require the use of a TM system within project management and then apply and/or request discounts for previously translated repetitions and fuzzy matches. It is a well-known fact that introducing CAT has had an impact on translators’ remuneration and turnover, which, in its turn, has influenced the evolution of such tools. Commercial TM systems tend to favour primarily what could be automated for translation service providers. Consistency, efficiency, and automation frequently lead to productivity, but not necessarily to quality (Bowker 2005; Guerberof 2009). Apart from the aforementioned shortcomings, the TM segment-based, decontextualized approach can also compromise the overall readability of the resulting translated text, especially as regards terminology, collocations, and style.

The issues mentioned above are perhaps the major drawbacks that translators experience when using TM systems. But not all translators exhibit the same degree of comfort or awareness of CAT tools (see section 35.5). Many of them ignore recent technical advances or simply resist using automated translation tools on the grounds of poor quality of the output, budgetary restrictions, and time investment. The picture is further complicated by the fact that translators tend to resort to other electronic resources during the information seeking/checking phases, either in a stand-alone fashion or combined with TM and MT systems. Bilingual search engines, multilingual electronic dictionaries, corpora and concordancers are examples of less automated resources that also fall under the umbrella term of translation technologies. These online resources and applications are the immediate result of present-day digital technology and globalisation. Together with MT and CAT tools, resources play an essential role in assisting, optimising and automating the translation process.

35.4.1 Search engines and general reference

Despite the recent technological drive within the industry, translators still prefer to manually consult online resources instead of relying exclusively on automated tools. In this section, we will deal with the kind of free, web-based electronic resources frequently used by translators (see section 35.5). Those resources are lexical in nature, term-orientated or based on cross-language information retrieval. We do not intend to offer a comprehensive account of resources. Instead, we will illustrate the common documentary needs associated with the translation of a specialized text and the main resources available. For the sake of argument, this section will focus on English–Spanish translation within the scientific and medical domains (see also Chapter 45).

A preliminary phase of any translation task involves terminology and documentary searches. Translators have at their disposal a myriad of Internet-based general reference resources, such as specialized searchable databases (‘invisible Web’), crawler-based gateways, portals, human-powered directories, and websites, termbanks, dictionaries and glossaries, directories of dictionaries (monolingual and bi-/multilingual), etc. In the healthcare context, a starting point would be specialised portals, directories, and databases, as well as search and metasearch engines for locating scientific and medical information, such as HealthFinder1, HON MedHunt2, RxList3, Web Directory4, WorldWideScience5, a global science gateway underpinned by deep web technologies, eHealthcarebot6 and its companion subject tracer information blog7 or iMediSearch8, that clusters results and enables customisation by user types (general public, physician, pharmacist, nurse and allied health).

Parallel texts can be located and retrieved from such multifaceted directories and portals or from online freely accessible directories of scientific periodicals, textual databases, virtual libraries, specialised websites, professional or academic associations and open-access initiatives. Some good examples are Free Medical Journals9; OmniMedicalSearch10, MEDLINE Plus by the National Library of Medicine through PubMed in Spanish11; virtual libraries in Spanish like SciELO-Scientific Electronic Library Online12; and other open-access initiatives like the scientific e-journals portal e-Revistas. Plataforma Open Access de Revistas Científicas13 and DOAJ-Directory of Open Access Journals14.

Documentary searches on terminology are a very important step at any phase of the translation process. Single dictionaries and lists of dictionaries for a given language in a specific domain can be located through search, metasearch, and multisearch engines. A serious drawback of dictionaries and other lexical resources on the Internet is their overall quality. Not all the terms included have been validated by experts or can be fully trusted. This is the reason why termbanks created by ‘official’ bodies rather than dictionaries created only by Internet users are preferred by translators. Some well-known multilingual termbanks are IATE – InterActive Terminology for Europe15, UNTERM– United Nations Multilingual Terminology Database16, UNESCOTERM17, Termium18 and EuroTermBank19.

However, translators also resort to specialised glossaries within websites, like MedTerms, the medical dictionary for MedicineNet20 and the dictionaries available from the Spanish Royal Academy of Medicine21, as well as to directories of dictionaries in order to save time, e.g., Lexicool22, Glossay Links23, Terminology Forum24, Sitoteca25, the GlossPost glossaries26 and the Wikipedia glossaries27, to name but a few.

Translation TechnologyClick to view larger

Figure 35.4. Screenshot for ‘stem cell’ by BabelNet

In recent years, Wikipedia28 and its many language editions (Multilingual Wikipedia) have become a rather popular resource among professional translators (see section 35.5). Some NLP applications have been developed to search the (multilingual) content in Wikipedia. For instance, Tradupedia29 finds equivalent entries, concepts and terms: it provides célula madre as equivalent of stem cell. A similar NLP system that maps encyclopaedic entries to a computational lexicon automatically is BabelNet30. This multilingual encyclopaedic dictionary and semantic network incorporates several resources, namely Wikipedia, Wikidata, Wiktionary, OmegaWiki, Wikidata, Wikiquote, VerbNet, Wordnet, WoNeF, ItalWordNet, Open Multilingual WordNet, ImageNet, WN-Map and Microsoft Terminology (Navigli and Ponzetto 2012). BabelNet (version 3.7) covers 271 languages and provides rich information for each query search: monolingual definitions (glosses), translation equivalents, concepts (Babel synsets), pronunciations, illustrative images (captions), Wikipage categories, multiword units, complex and related forms (akin to ontologies), as well as an external link to DBpedia (see Figure 35.4). All concepts are cross-referenced in (Multilingual) Wikipedia and can be searched individually. Ontological resources such as these enable translators to obtain a structured preliminary vision of the domain (akin to ontologies, such as types, therapies, provenance of stem cells), to identify core terms and multiword units, to assess the degree of concept correspondence between the source language and the target language terms and to to establish other potential equivalents, e.g., stem cell treatments ≈ tratamientos con células madre; cell culture ≈ cultivo celular; induced pluripotent stem cells ≈ células iPS etc.

All concepts represented in the map are cross-referenced in Wikipedia and can be searched individually. The goals would be to get a structured preliminary vision of the domain (akin to ontologies, such as types, therapies, provenance of stem cells), to identify core terms and multiword units, to assess the degree of concept correspondence between the SL and the TL terms and to establish other potential equivalents: stem cell treatmentstratamientos con células madre; cell culturecultivo celular; induced pluripotent stem cellscélulas iPS etc.

35.4.2 Look-up Tools and CLIR Applications

Some directories of dictionaries are, in fact, metadictionaries that incorporate a search engine that allows users to perform a multiple search query for a given term in several dictionaries (and other lexical resources) simultaneously and retrieve all the information in just one search results page (resource look-up). Wordreference31 is a popular metadictionary for free online Oxford dictionaries (monolingual and bilingual), grammars as well as fora. For example, the search query ‘stem cell’ into Spanish yields one results page with translation equivalents (célula madre or célula primordial or célula troncal), ‘principal translations’ i.e. preferred translations (célula madre), specialized sense in the domain (‘biology: self-renewing cell’), run-ons (~ ~ research investigación de las células madres or troncales or primordiales), bilingual examples, forum discussions in the medical domain with stem cell in the title as well as external links to images, sample contexts, synonyms, and the like.

In a similar fashion, Diccionarios.com32 caters for multiple term search queries in Larousse and Vox dictionaries (monolingual and bilingual with Spanish), whereas Reverso33 searches Collins dictionaries, as well as grammar checkers and Internet (websites, images, e-encyclopedias, etc.). It also includes collaborative bilingual dictionaries and a free MT system. Similar hybrid applications are Glosbe34, a huge multilingual dictionary database combined with an on-line translation memory, and Word2Word35, that incorporates metadictionaries, corpora and language search engines, free MT and other language services. Yourdictionary36 can search up to 2,500 lexical resources (dictionaries, glossaries, thesauri, termbanks, corpora, etc.) for 300 languages in one go. Onelook37 indexes over 900 dictionaries (general, specialized, monolingual, and bilingual) and caters for exact and approximate queries by means of wildcards. ProZ.com is a search directory of glossaries and dictionaries built by and for professional translators which performs multiple searches for medical, legal, technical, and other specialized terms38. Similarly to metadictionaries, some termbanks, portals and directories also allow for resource look-up. This is the case of TermSciences39 and FAO Term Portal40.

MagicSearch41 is a customisable multilingual metasearch engine that retrieves one-page results from multiple sources (dictionaries, corpora, machine translation engines, search engines). InterTerm42 is another hybrid system that performs look-up searches for terms and multiword terms in dictionaries and glossaries (general, specialized, monolingual, bilingual English–Spanish and French–Spanish), specialized translations, metadictionaries, termbanks, e-books, websites, Wikipedia and even cross-language web-based applications (see below). For instance, a search for histocompability complex in bilingual sources (English–Spanish) access to 210,188 ‘translation pairs’ and yields one direct result (complejo de histocompatibilidad) and twenty-five ‘partial’ (approximate) results with complex as a noun phrase head (aberrant complex ≈ complejo aberrante) or as a pre-modifier (complex adaptive system ≈ sistema adaptativo complejo).

Finally, IntelliWebSearch43 is a comprehensive resources look-up tool that can be tailored to translators’ needs. The rationale behind it is to speed up the terminology look-up process. So, instead of having to consult different resources in a linear fashion, the tool enables users to select up to fifty electronic resources (web-based, on CD-Rom or installed on the hard disk) that will be searched for a particular terminology check/research in one go. The tool can be customized not only as regards the selection of resources (search settings), but also as regards the choice of the interface language and the shortcut key combinations (programme settings). IntelliWebSearch can be downloaded and executed as a desktop programme. Users simply have to select a word sequence in a given text, press the shortcut keys, and a search window will appear on the computer screen with the copy-and-paste sequence.

A second major category of applications used by translators in their daily work are bi-/multilingual systems which are closely related to cross-language information retrieval (CLIR) or multilingual information retrieval (MIR). CLIR systems enable users to pose queries in one language and retrieve information in another language different from the language of the user’s query (see Chapter 36). MIR systems are a variety of CLIR systems with the peculiarity that the document collection is multilingual. One example of such a tool is PatentScope®44, the search engine of the World Intellectual Property Organization. The PatentScope engine enables users to search international and national patent databases. The cross-lingual supervised expansion search allows for queries in one of several languages in the domains selected. The results retrieved will be in another language different from the language of the user’s query. Also, in this case, the user’s query and their synonyms can be translated into other languages. For example, the translator may want to look for equivalents for stem cells or simply check whether células madre and possible synonyms (células troncales, células primordiales) are valid equivalents for stem cells. In both cases, he or she will have to restrict the search to the medical technology ([MEDI]) domain. In the first case, the user will input the query in English and retrieve as results the translated equivalent terms células madre, células primordiales, and células totipotenciales. In the second case, the translator will input the query in one language (Spanish) and will get the query results in the other (English). The system also enables the user to translate most of the sample sentences and the patent titles by activating the ‘Show translation tool’.

Translation TechnologyClick to view larger

Figure 35.5. Conceptual map for ‘célula madre’ by Mindpedia.

Bilingual search engines are essentially MIR systems which use seed words in one language in order to retrieve bilingual documents in the two languages involved. A basic example is Sobotong45, a dual language search engine that enables users to type in the query sequence in one language and to get results both in the source language and the target language selected. Results are retrieved in a monolingual, single-page fashion; and they can be further filtered through file formats (.html/.xml, .doc, .txt, .ppt, .excel, .rtf). A more sophisticated sample of such bilingual search engines is 2Lingual46. Powered by Bing and Google, it searches for documents, websites, and portals of a similar content in two separate languages, in a similar fashion to Bilingual Suggest Beta, the Google Chrome extension which is no longer available. 2Lingual features real-time search suggestions, a query translation option for cross-lingual searches, spelling corrections, cached pages, and related search links. This application opens new search possibilities for translators who want to speed up the collection of traditional ‘parallel texts’. It works like any other monolingual search engine. For example, to search for mesenchymal stem cells, one enters the sequence in the search box (English as the query language), selects the desired language combination (pages written in English and Spanish, in this case), and presses the ‘2Lingual Search’ button. The automatic query translation option feature translates the English query into Spanish (células madre mesenquimales). The Spanish equivalent, in its turn, serves as the seed sequence for the Spanish monolingual search. Results are displayed in two columns: English on the left side and Spanish on the right. So, the initial query is in English but information is retrieved in both languages (see Figure 35.5).

At the end of each column, there is a list of suggested related searches for each query sequence. These are single lexical units or, more frequently, n-grams that translators can use as indexing descriptors for further searches or even as multiword unit candidates. Searches can be done in a recursive fashion by clicking on the suggested related searches for the query sequences in each language. In this case, for mesenchymal stem cells, 2Lingual displays mesenchymal stem cell transplantation, mesenchymal stem cells clinical trials, mesenchymal stem cells cord blood, mesenchymal stem cells markers, etc. The suggested searches for Spanish contain numerous typos and tend to be less accurate (e.g. celulas [sic] madre totipotenciales, que [sic] es una celula [sic] madre).

35.4.3 Corpora

Nowadays, translators have started to use corpora (see Chapter 19) in their daily work (see section 35.5). There are free, web-searchable corpora for both English and Spanish: the BYU-BNC (Davies 2004–), the Corpus of Contemporary American English (COCA) (Davies 2008–), the Corpus of Global Web-Based English (GloWbE) (Davies 2013–), the Reference Corpus of Contemporary Spanish (CREA) (Real Academia Española, n.d.) and the Spanish Corpus at BYU (Davies 2002–), among others.

However, such corpora would be too general to retrieve sufficient or accurate results when translating specialized texts. For this reason, translators tend to build their own corpora, tailored to their specific needs. After all, using IR systems for documentary searches is just a preliminary step towards effective gathering of data by way of a unitary ‘corpus’. Corpus compilation can be made quite simple with the help of bilingual web-based IR applications and search engines. Let us go back to the searches performed with 2Lingual. To compile a bilingual comparable corpus, the translator can follow three easy steps: (1) collect all the URL addresses retrieved by 2Lingual for both languages; (2) download all the English documents in a single file or in several files (the ‘mesenchymal stem cell’ corpus); and (3) download all the Spanish documents (the ‘células madre’ corpus). Another possibility is to use any of the specialized (metasearch) engines to compile comparable corpora in both languages by seeding them with indexing terms in one language (say, English) and the corresponding indexing terms in the other language (say, Spanish). A third option would be to automate corpus compilation by means of NLP applications such as BootCat47, WebBootCat/Sketch Engine48 and Scrapbook49. The BootCat toolkit and WebBootCat - Sketch Engine (its web version) automatically compile specialized corpora and extracts terms from the Web by using lists of keywords within a given domain as input. Scrapbook is a Firefox extension which enables users to save and manage collections of web pages and to perform full text searches (as well as quick filtering searches).

Once the corpus (or corpora) have been assembled, translators need software for concordancing and text analysis. A concordancer searches through a corpus and identifies parts of it that match a pattern that the user has defined. Some of them can even search for phrases, do proximity searches, sample words, and do regular expression searches. The results are usually displayed as concordance lines in KWIC (keyword in context) format. Stop Lists let users specify words to be omitted from the concordances. Most concordancers also allow browsing through the original text and clicking on any word to see every occurrence of that word in context. From a concordance, the concordancer can straightforwardly calculate what words or structures occur with the pattern (collocations, patterns, n-grams), how frequently they occur (word frequency lists and indexes), and in what position relative to the pattern (n positions to the left/right of the node). Other common functionalities include textual statistics (word types, tokens and percentages, type/token ratio, character and sentence counts, etc.). Less commonly found utilities are lemmatization and part-of-speech tagging.

Some open-source and/or freeware concordancers used for monolingual corpus analysis are AntConc50 and TextStat51. For parallel corpora (bilingual or multilingual) there are a couple of freeware concordancers, such as CasualPConc52, and CasualMultiPConc, its multilingual version which can handle up to five languages.

There are also text analysis platforms that allow the uploading of both documents and URL addresses, e.g., Spaceless53 and Turbo Lingo54. More sophisticated examples are Compleat Lexical Tutor55 and TAPoRware56. Compleat Lexical Tutor 6.2 is an online platform for data-driven language learning on the Web which provides access to several corpora and also enables users to upload texts or manage linked corpora. This versatile platform integrates concordancers, range, and phrase extractors (n-grams) and offers text comparison and basic statistics functionalities. TAPoRware 2.0 is a suite of tools for text analysis that support files in .html, .xml, and .txt (ASCII) formats which can be either stored in a computer or accessed from the Internet. Among its many functionalities, TAPoRware can (a) produce wordlists in different orders, (b) find words, collocations, dates, patterns and fixed expressions, (c) display the results in KWIC format or in other different ways, (d) perform basic statistics, and (e) provide word distribution and compare any two texts.

Translation TechnologyClick to view larger

Figure 35.6. Screenshot of bilingual search for ‘mesenchymal stem cells’ by 2Lingual.

Recent technological advances have enabled translators to access not only a handful of websites but the whole Internet as a gigantic corpus. Monolingual web concordancers search the Web and display results by means of concordance lines in KWIC format. WebCorp57 mines the Web and produces a concordance display which is sortable. It can perform searches for words, phrases, or patterns (wildcards and groups of characters in square brackets and separated by the pipe character) in any language. Searches can be word-filtered, as well as restricted by domain, country, and time span. Results also include collocations and patterns. Figure 35.6 shows the results in KWIC format sorted by one word to the left for the query ‘gene therapy’ extracted from British websites in the health domain.

Glossanet58, KWICfinder59 and Corpus Eye60 function in a similar way, although they do not allow for sophisticated processing of the documents accessed and retrieved.

Translation TechnologyClick to view larger

Figure 35.7. KWIC lines sorted to 1-L for the query ‘therapy’ by WebCorp.

Bilingual websites (original texts and their translations) can also be automatically retrieved and processed by bilingual web concordancers. Original texts and their translations turn into parallel corpora that offer translated segments similarly to TM systems. WeBiText61, MyMemory62, Linguee63 and Glosbe64 are examples of this sort of NLP application. WeBiText retrieves translations of words and expressions from a list of predefined and/or selected websites used as a parallel corpus. The application displays the results as segments aligned and allows for side-by-side viewing of both web documents. It also provides access to Termium, the Canadian termbank (see Figure 35.7).

Translation TechnologyClick to view larger

Figure 35.8. Results of query for ‘mesenquimal’ by Web Concordancer.

Finally, Linguee combines a bilingual dictionary and a bilingual web concordancer that provides fuzzy matches at the sentential or sub-sentential levels, as if it were a TM system. The Web as a comparable or a parallel gigantic corpus provides translators with instant information on terminology, collocations and patterns, definitions, related concepts, style and text conventions in both SL and TL, as well as examples of how words and phrases are used and translated in context. Figure 35.8 depicts an example of a search in Linguee.

35.5 Translators’ Perspectives on Technology

This section deals with translators’ attitudes to and their use and awareness of translation technologies in general. With this aim, the results of various surveys on the use of technologies by professional translators and other industry agents will be presented. Within the FP7 project TTC (Terminology Extraction, Translation Tools and Comparable Corpora, ICT-2009-248005), the 2010 TTC survey was conducted through an online questionnaire about terminology and corpus practices with the aim of identifying needs in the translation and localization industry (Gornostay 2010; Blancafort et al. 2011). 139 language professionals from thirty-one countries answered more than forty questions about (a) the practical use of terminology management tools, (b) the use of MT and CAT tools, and (c) the use of corpora, corpus tools, and NLP applications.

The TTC survey showed that 74% of the respondents are using automated translation tools. Most of them focus on CAT, particularly localization tools and TM systems like Trados or Similis, and to a lesser extent, on MT (only 9%): both commercial systems (Language Weaver or Systran) and free online software, like Google Translator, which is the most popular one among respondents. The principal reasons for the limited use of MT systems are their high prices and the low quality of the translated output, which makes it unsuitable for specific domains. Even though recent surveys have revealed rapid growth of MT use (cf. Torres-Domínguez 2012; Doherty et al. 2013; Zaretskaya et al. 2015, 2016), TM and term management tools remain translators’ preferred TEnTs components.

This situation is in line with the findings of former studies. The LISA 2004 Translation Memory Survey (Lommel 2004) aimed at describing the TM technology uptake of translation and localization companies. More than 270 companies worldwide filled in the online questionnaire which covered issues related to translation volumes, usage rates, and repository sizes of TMs, choice of CAT tools, the role of standards, and future trends in TM implementation. The survey revealed an expanding TM market where companies have introduced TM technology initially as a means to increase revenue in localization. Later on, the scope has been widened to other types of translation projects as a strategy to gain market advantage through reduced costs, increased quality, and a faster time-to-market. The survey also showed that the majority of companies were planning to extend the use of TM technologies, although the market was heavily dominated at that time by a handful of TM tools, namely Trados, followed by SDLX, Déjà Vu, and Alchemy Catalyst.

There has been a steady increase in the use of CAT tools in the last decade, as compared with previous surveys. In 2004, a survey on translation technologies adoption by UK freelance translators was carried out by means of a mailed questionnaire (Fulford and Granell-Zafra 2005). The main conclusion of this survey is that the penetration rate of general-purpose software (word processing, desktop publishing, etc.) among UK freelancers is higher than the uptake of special-purpose software applications (TM and TMS tools). Only 28% of the 439 respondents had a TM system in place (Trados, Déjà Vu, SDLX, and Transit) and almost half of them were unfamiliar with those tools. A very small percentage (2%) used localization tools (Alchemy Catalyst and Passolo), whereas only 5% were using MT systems. A survey conducted two years later depicts a different scenario: 82.5% of translators are already using TM systems (Lagoudaki 2006). Translators who work with repetitive, voluminous texts and translators who specialize in technical, financial, and marketing texts are more likely to use TM technology. All three surveys revealed a strong correlation between translators’ IT proficiency and translators’ uptake of TM technology.

Gouadec (2007) found that in 95% of the more than 430 job advertisements for professional translators that he surveyed, experience with translation memory systems was mentioned as a prerequisite. Meanwhile, a series of biennial surveys carried out by a major professional translators’ association in Canada—the Ordre des traducteurs, terminologues et interprètes agréés du Québec—reveals that in 2004, 37.7% of the 384 respondents indicated that they used a translation memory system. By 2012, this number had nearly doubled to 69.6% (for 336 respondents). Finally, LeBlanc’s (2013) recent ethnographic study of translators, where he observed and interviewed over fifty professional translators in their workplace, confirms that many translators do find the use of translation memory tools to be a key factor in increasing their productivity while maintaining quality (see also section 35.2.2).

According to the survey conducted by Zaretskaya et al. (2015, 2016), the percentage of TM users seems to have decreased (76%) nowadays, but it still remains much higher than for other types of technologies, e.g. MT (36%), standalone MT (13%), integrated MT (35.5%), quality assurance (60%), etc. Another interesting finding is the diversification of CAT tools translators tend to use on a daily basis.

The evolution of technology is also directly responsible for a recent trend within the translation industry as regards adoption rates of TM systems. According to the Translation Industry Survey 2010/2011 (TradOnline, 2011), the sharing of translation memories (and terminological resources), together with automated translation, are the most important new technologies and processes that have appeared in the translation industry over the last fifteen years. 51% see the sharing of translation memories as an opportunity, while 34% see it as a risk. This leads us to the 2011 TAUS/LISA survey on translation interoperability, or, in other words, with the key issue of doing business with multiple TSPs who use a variety of tools. Among the 111 respondents to the survey, there were language service providers (41.8%) and translators (8.2%), language technology providers (7.3%), and buyers of translation (30%). More than 50% think that the industry’s failure to exchange TM and terminology in a standard format increases their business expenditures. Interoperability also covers the integration of translation software with content and document management systems. The main technology areas that face interoperability are the following: translation memory (80.7%), terminology management (78%), content management systems (67.9%), translation management systems or global management systems (66.1%), localization workbench and CAT tools (60.6%), quality assurance and testing (48.6%), machine translation (45.9%), and online and cloud-based resources, e.g. shared TM and terminology (30.3%). According to the survey, interoperability could improve efficiency, increase revenue, and improve translation quality. However, there are still serious obstacles to achieving interoperability. Lack of compliance to interchange format standards (TMX, TBX, XLIFF, etc.), legal restrictions and confidentiality of information, lack of maturity in the translation industry, or budgetary restrictions are some of the stumbling blocks mentioned in the survey for a wider adoption of interoperability standards.

Concerning terminology extraction and management tools, the situation has remained almost the same since the former industry surveys. Translators strive to ensure terminology consistency and enhance productivity at the same time. However, a high percentage of translation service providers do not systematically manage terminology and when they do so, they simply resort to the terminology tools integrated in TMS, as already pointed out in the LISA 2004 survey. In addition, the 2004 UK survey reported by Fulford and Granell-Zafra (2005) showed that only 24% had TMSs in place (Multiterm, Lingo, and TermWatch), whereas half of them were not familiar with those tools at all. In addition, the most common resources used by translators for manual search of terms and terminology research were Internet search engines (85%), online dictionaries and glossaries (79%), multilingual terminology databanks (59%), textual archives (51%), and online encyclopedias and academic journals (30%).

In the same vein, SDL ran two surveys about trends and opinions about terminology work from the translation and localization industry (SDL 2008). The first questionnaire received 140 responses about the effects that terminology has on business branding and customer satisfaction. The second questionnaire was completed by 194 localization and translation professionals, who provided their own perspective of the use and management of terminology. Only 31% of translators use a specific terminology management tool. By contrast, they tend to use terminology lists in Excel spreadsheets (42%), publish them in style guides (6%) or simply create and circulate them via e-mail (6%). Only 10% of translators use terminology extraction tools; instead, most of them (84%) continued to select terms manually from documents.

The 2010 TTC survey also corroborates those trends. The majority of respondents (56%) spend 10–30% of their time working with terminology. The terminology tasks performed focus on bilingual research and term collection. Lexical work seems to be the main task concerning terminology search (22.4%), e.g. definitions, translation equivalents and the like, followed by grammar, contextual and usage information, among others. Terminological research is basically performed by means of internal, client, and online resources. Apart from Internet searches, respondents make extensive use of termbanks, portals, and gateways (35%). IATE, EuroTermBank, and Microsoft Language Portal appear to be the most popular. However, practitioners do not tend to use term extractors or terminology management tools, but they continue to perform manual searches and still use spreadsheets (Excel) and Word documents as the main means of storing and exchanging terminology. Respondents often mention budget and/or time constrains, and deficiencies in the functionalities of existing tools as reasons for such an unbalanced picture. Nowadays, in addition to the frequently used online resources, translators tend to perform most terminology searches with tools integrated in TM systems: terminology management (58%), terminology extraction (25%), and bilingual parallel corpora (66.7%) (Zaretskaya et al. 2015, 2016).

The 2010 TTC survey shows that translators are increasingly using other types of resources for terminology extraction and research. Half of the respondents also collect corpora of the relevant domain within the core areas of their translation expertise. However, only 7% use concordancers and other NLP tools (mainly POS taggers) for corpus processing. The vast majority of translators still prefer to skim texts, highlight relevant terms, and perform manual searches of equivalents. In any case, corpus compilation is perceived as a time-consuming task and NLP and corpus tools remain largely unknown. Those results are in line with Durán Muñoz (2012). Translators tend to compile ad hoc corpora when translating, in order to check terms in context (meanings, usage, register, and style) and to extract terms to populate their own termbanks. Respondents mainly compile parallel corpora (36.02%) of original texts and their translations, bilingual comparable corpora (18.63%) of original texts in both languages, and, less frequently, monolingual comparable corpora (10.56%) of original texts in the source or the target language. Although 65.21% compile their own corpora, NLP tools are not mentioned at all and only 14.29% seem to use some kind of corpus management and processing tools (e.g. WordSmith Tools). Finally, 34.78% of respondents do not compile corpora when translating mainly due to lack of time, because they do not find those resources useful or, simply, they are unaware of their existence.

Although most translators use corpora for reference purposes and as a translation aid (cf. Torres-Domínguez 2012), not all of them seem to be familiar with special tools for creating and managing their own corpora. According to Zarestakaya et al. (2015), there has been a slight decrease in translators’ use of corpora (15%) and corpora tools (17%). The authors argue that low percentages in corpus use are partially due to the wording of the questionnaire, as translators admit to using all kind of reference texts which they do not necessarily classify as ‘corpora’. For instance, most translators use TMs as parallel corpus to search for translation equivalents or create TMs from parallel texts.

Translators show a varying degree of awareness and familiarity with different translation technology tools and resources. Newly qualified translators, translators specialized in technical, financial, and marketing texts, and translators with TM experience and IT proficiency seem to show a more positive attitude towards TM technology (cf. Lagoudaki 2006). Yet, it comes as a surprise for both industry and software developers and researchers, that TM technology and other CAT tools are less widely used than one might expect. As mentioned before, MT systems are seldom used, and TMS and term extractors represent are also rare in the daily work of translators, who still prefer to perform manual terminology search, research, storage, and exchange. In contrast, translators find themselves quite comfortable with regard to Internet resources; they tend to compile DIY corpora and they seem to be particularly knowledgeable with regard to information mining. Fulford and Granell-Zafra (2008) point out that freelance translators tend to integrate web-based resources and services in their daily translation workflow, such as online dictionaries, terminology databases, search engines, online MT systems, Internet services, etc. The same applies to other types of professional translators and translation students alike (cf. Enríquez Raído, 2014). From this stance, information and technology (IT) competence and translation technologies would refer to tools—standalone or integrated in a TEnT—as well as to web-based resources and applications. Other generic tools and Internet services, while relevant to the professional translator, could not be considered ‘translation technologies’ proper, unless one adopts an extremely broad conception (as in Alcina 2008). At most, they could probably fall under the vague and general umbrella of ‘other technologies used by translators’.

35.6 Summary

This chapter has presented an introduction to translation technologies from the point of view of translators. As such, it first introduced several of the tools most widely used in the translation industry today, including Translation Environment Tools, along with their core components, which include translation memory systems and terminology management systems. Additional tools useful for terminology processing, such as term extractors and bilingual concordancers, were also presented. Next, the chapter outlined localization tools, which incorporate TMs and TMSs, but which also include some additional functions to allow translators to adapt digital content. A discussion on other kinds of resources commonly used by translators followed, namely, general-reference e-resources (search engines, directories, portals, metadictionaries, etc.), web-based resources and applications (resources look-up and bi-/multilingual tools), and corpora (DIY, concordancers and the Web as corpus). The chapter concluded with surveys on the use of technologies by professional translators, their tech-savviness, and their IT competence.

Further Reading and Relevant Resources

This chapter provides an overview of a number of key types of tools and resources for translators, but interested readers are encouraged to consult additional sources to find out more. Table 35.1 contains a summary of the basic functions of some of the components most commonly found in TEnTs (see also Kenny 2011; Bowker and Fisher 2012a,b); however, for a more detailed description of these tools and their functionalities, refer to Austermühl (2001), Bowker (2002), Quah (2006), and L’Homme (2008). For a comprehensive account of general-purpose and core-translation software, see Zetzsche (2017). A state-of-the-art overview of CAT tools and machine translation can be found in the volume edited by Chan (2015). On the evaluation of translation technologies, see the special issue of Linguistica Antverpiensa edited by Daelemans and Hoste (2009). Translators’s use of corpora is discussed in the volumes edited by Fantinuoli and Zanettin (2016) and by Corpas Pastor and Seghiri (2016); the latter covers also interpreting. The Web as Corpus Workshops Proceedings by ACL SIGWAC is an excellent source of information on this topic (http://www.sigwac.org.uk/). See also Computational Linguistics—Special issue on the Web as corpus 29(3), 2003. Nowadays, Web as corpus is turning into gigatoken web corpora, as in the COW (COrpora from the Web) project (Schäfer and Bildhauer 2012, 2013). Austermühl (2001), L’Homme (2004), and Bowker (2011) provide useful insights into computer-aided terminology processing. A comparison of the strengths and weaknesses of TM systems and bilingual concordancers can be found in Bowker and Barlow (2008). Information about experiences combining TM systems with machine translation systems is described in Lange and Bennett (2000). Bouillon and Starlander (2009) discuss translation technologies from a pedagogical viewpoint. Christensen and Schjoldager (2010) provide a succinct overview of the nature, applications, and influence of TM technology, including translators’ interaction with TMs. In this line, Olohan (2011) has recently advanced a sociological conceptualization of translator–TM interaction. Until 2011, the Localization Industry Standards Association (LISA) was an excellent resource for information on standards such as Translation Memory eXchange (TMX) and Term Base eXchange (TBX), among others. Since then, two new organizations have arisen in response to the closure of LISA: the Industry Specification Group (ISG), started by the European Telecommunications Standards for localization, and Terminology for Large Organizations (TerminOrgs), founded by members of the former LISA Terminology Special Interest Group. A detailed overview of the localization process and the role of translation and translation technologies within it can be found in Esselink (2000) and Pym (2004). On the potential impact of crowdsourcing on translation technologies and the translation profession, refer to Abekawa et al. (2010) and García (2010). Gough (2011) surveys professional translators’ attitudes, awareness, and adoption of emerging Web 2.0 technologies (e.g. crowdsourcing, TM sharing, convergence of MT with TM, etc.). See also the latest study by SDL (2016) on the role and the future of technology in translation industry. Finally, on the closely-related topic of technology tools for interpreters, refer to Costa, Corpas Pastor and Durán Muñoz (2014), Sandrelli (2015) and Fantinuoli (2017).

References

Abekawa, Takeshi, Masao Utiyama, Eiichiro Sumita, and Kyo Kageura (2010). ‘Community-based Construction of Draft and Final Translation Corpus through a Translation Hosting Site Minna no Hon’yaku (MNH)’. In Nicoletta Calzolari, Kahlid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner and Daniel Tapias (eds), Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC), Valletta, Malta, 3662–3669. Paris, France: European Language Resources Association. Retrieved from <http://www.lrec-conf.org/proceedings/lrec2010/pdf/243_Paper.pdf>.Find this resource:

    Amparo Alcina (2008). ‘Translation Technologies. Scope, Tools and Resources’, Target 20(1): 79–102.Find this resource:

      Austermühl, Frank (2001). Electronic Tools for Translators. Manchester, UK: St. Jerome Publishing.Find this resource:

        Benito, Daniel (2009). ‘Future Trends in Translation Memory’, Revista Tradumàtica—Traducció i Tecnologies de la Informació i la Comunicació 7. Retrieved from <http://www.fti.uab.es/tradumatica/revista/num7/articles/07/07.pdf>.Find this resource:

          Blancafort, Helena, Ulrich Heid, Tatiana Gornostay, Claude Méchoulam, Béatrice Daille, and Serge Sharoff (2011). ‘User-centred Views on Terminology Extraction Tools: Usage Scenarios and Integration into MT and CAT Tools’. Tralogy [on-line] Session 1—Terminologie et Traduction, 3–4 March. Retrieved from <http://lodel.irevues.inist.fr/tralogy/index.php?id=91&format=print>.Find this resource:

            Bouillon, Pierrette and M. Marianne Starlander (2009). ‘Technology in Translator Training and Tools for Translators’. In MT XII: Proceedings of the Twelfth Machine Translation, 26–30 August 2009. Retrieved from <http://www.mt-archive.info/MTS-2009-Bouillon-ppt.pdf>.Find this resource:

              Bowker, Lynne (2002). Computer-aided Translation Technology: A Practical Introduction. Ottawa, ON: University of Ottawa Press.Find this resource:

                Bowker, Lynne (2003). ‘Terminology Tools for Translators’. In Harold Somers (ed.), Computers and Translation: A Translator’s Guide, 49–65. Amsterdam, The Netherlands/Philadelphia, PA: John Benjamins.Find this resource:

                  Bowker, Lynne (2005). ‘Productivity vs. Quality: A Pilot Study on the Impact of Translation Memory Systems’, Localisation Focus 4(1): 13–20.Find this resource:

                    Bowker, Lynne (2006). ‘Translation Memory and “Text”’. In Lynne Bowker (ed.), Lexicography, Terminology and Translation: Text-based Studies in Honour of Ingrid Meyer, 175–187. Ottawa, ON: University of Ottawa Press.Find this resource:

                      Bowker, Lynne (2011). ‘Off the Record and on the Fly: Examining the Impact of Corpora on Terminographic Practice in the Context of Translation’. In Alet Kruger, Kim Wallmach, and Jeremy Munday (eds), Corpus-based Translation Studies: Research and Applications, 211–236. London and New York: Continuum.Find this resource:

                        Bowker, Lynne and Mike Barlow (2008). ‘A Comparative Evaluation of Bilingual Concordancers and Translation Memory Systems’. In Elia Yuste Rodrigo (ed.), Topics in Language Resources for Translation and Localisation, 1–22. Amsterdam, The Netherlands/ Philadelphia, PA: John Benjamins.Find this resource:

                          Bowker, Lynne and Des Fisher (2012a). ‘Technology and Translation’. In Carol A. Chapelle (ed.), Encyclopedia of Applied Linguistics, Wiley Online Library. Oxford, UK: Wiley-Blackwell. doi: 10.1002/9781405198431.Find this resource:

                            Bowker, Lynne and Des Fisher (2012b). ‘Technology and Terminology’. In Carol A. Chapelle (ed.), Encyclopedia of Applied Linguistics, Wiley Online Library. Oxford, UK: Wiley-Blackwell. doi: 10.1002/9781405198431.Find this resource:

                              Cabré Castellví, M. Teresa, Rosa Estopà Bagot, and Jordi Vivaldi Palatresi (2001). ‘Automatic Term Detection: A Review of Current Systems’. In Didier Bourigault, Christian Jacquemin, and Marie-Claude L’Homme (eds), Recent Advances in Computational Terminology, 53–87. Amsterdam, The Netherlands/Philadelphia, PA: John Benjamins.Find this resource:

                                Chan, Sin-Wai, ed. (2015). Routledge Encyclopedia of Translation Technology. Abingdon/New York: Routledge.Find this resource:

                                  Christensen, Tina Paulsen and Anne Schjoldager (2010). ‘Translation-Memory (TM) Research: What Do We Know and How Do We Know It?’, Hermes—The Journal of Language and Communication 44: 1–13. Retrieved from <http://download2.hermes.asb.dk/archive/download/Hermes-44-paulsen%20christensen&schjoldager.pdf>.Find this resource:

                                    Colominas, Carme (2008). ‘Towards Chunk-based Translation Memories’, Babel 54(4): 343–354.Find this resource:

                                      Corpas Pastor, Gloria (2007). ‘Lost in Specialised Translation: The Corpus as an Inexpensive and Under-exploited Aid for Language Service Providers’. In Translating and the Computer 29: Proceedings of the Twenty-ninth International Conference on Translating and the Computer, 29–30 November, London. London: Aslib/IMI. 1-18. Retrieved from <http://mt-archive.info/Aslib-2007-Corpas-Pastor.pdf>Find this resource:

                                        Corpas Pastor, Gloria and Míriam Seghiri, eds. (2016). Corpus-based Approaches to Translation and Interpreting. From Theory to Applications. Frankfort/New York: Peter Lang.Find this resource:

                                          Costa, Hernani, Gloria Corpas Pastor, and Isabel Durán Muñoz (2014). ‘Technology-assisted Interpreting’, Multilingual 143: 27-32.Find this resource:

                                            Daelemans, Walter and Véronique Hoste (eds) (2009). Evaluation of Translation Technology. Special Issue of Linguistica Antverpiensia New Series: Themes in Translation Studies, vol. 8.Find this resource:

                                              Davies, Mark (2002–). Corpus del Español: 100 Million Words, 1200s–1900s. Available online at <http://www.corpusdelespanol.org>.

                                              Davies, Mark (2004–). BYU-BNC (Based on the British National Corpus from Oxford University Press). Available online at <http://corpus.byu.edu/bnc/>.

                                              Davies, Mark (2008–). The Corpus of Contemporary American English: 450 Million Words, 1990–Present. Available online at <http://corpus.byu.edu/coca/>.

                                              Désilets, Alain, Benoit Farley, Marta Stojanovic, and Geneviève Patenaude (2008). ‘WeBiText: Building Large Heterogeneous Translation Memories from Parallel Web Content.’. In Translating and the Computer 30: Proceedings of the Thirtieth International Conference on Translating and the Computer, 27–28 November 2008, London. London: Aslib/IMI. Retrieved from <http://mt-archive.info/Aslib-2008-Desilets.pdf>.Find this resource:

                                                Doherty, Stephen (2016). ‘The Impact of Translation Technologies on the Process and Product of Translation.’ International Journal of Communication 10: 947-969.Find this resource:

                                                  Doherty, Stephen, Federico Gaspari, Declan Groves, Josef van Genabith, Lucia,Specia, Aljoscha Burchardt, Arle Lommel and Hans Uszkoreit (2013). QTLaunchPad –Mapping the Industry I: Findings on Translation Technologies and Quality Assessment. European Comission Report. Technical report.Find this resource:

                                                    Dunne, Keiran J. (ed.) (2006). Perspectives on Localization. Amsterdam, The Netherlands/Philadelphia, PA: John Benjamins.Find this resource:

                                                      Dunne, Keiran J. and Elena S. Dunne (eds) (2011). Translation and Localization Project Management. Amsterdam, The Netherlands/Philadelphia, PA: John Benjamins.Find this resource:

                                                        Durán Muñoz, Isabel (2012). La ontoterminografía aplicada a la traducción: Propuesta metodológica para la elaboración de recursos terminológicos dirigidos a traductores. Studien zur romanischen Sprachwissenschaft und interkulturellen Kommunikation 80. Frankfurt, Germany/New York: Peter Lang.Find this resource:

                                                          Esselink, Bert (2000). A Practical Guide to Localization. Amsterdam, The Netherlands/Philadelphia, PA: John Benjamins.Find this resource:

                                                            Fantinuoli, Claudio (2017). ‘Computer-assisted Interpreting: Challenges and Future Perspectives.’ In Trends in e-tools and resources for translators and interpreters, ed. by Gloria Corpas Pastor and Isabel Durán Muñoz. Leiden/Boston: Brill.Find this resource:

                                                              Fantinuoli, Claudio and Federico Zanettin, eds. (2016). New directions in corpus-based translation studies. Berlin: Language Science Press. Retrieved from:http://langsci-press.org/catalog/book/76Find this resource:

                                                                Fulford, Heather and Joaquin Granell-Zafra (2005). ‘Translation and Technology: A Study of UK Freelance Translators’, The Journal of Specialised Translation (JoSTrans) 4: 2–7. Retrieved from <http://www.jostrans.org/issue04/art_fulford_zafra.php>.Find this resource:

                                                                  Fulford, Heather and Joaquin Granell-Zafra (2008). ‘The Internet in the Freelance Translator’s Workflow’, International Journal of Translation (IJT) 20(1–2): 5–18.Find this resource:

                                                                    Gambín, José (2014). ‘Evolution of cloud-based translation memory.’ MultiLingual 25(3): 46-49.Find this resource:

                                                                      Garcia, Ignacio (2007). ‘Power Shifts in Web-based Translation Memory’, Machine Translation 21(1): 55–68.Find this resource:

                                                                        Garcia, Ignacio (2009). ‘Beyond Translation Memory: Computers and the Professional Translator’, The Journal of Specialised Translation (JoSTrans) 12: 199–214. Retrieved from <http://www.jostrans.org/issue12/art_garcia.php>.Find this resource:

                                                                          Garcia, Ignacio (2010). ‘The proper place of professionals (and non-professionals and machines) in web translation.’ Tradumàtica – Traducció I Tecnologies de la Informació i la Comunicació 8. Retrieved from http://www.fti.uab.es/tradumatica/revista/num8/articles/02/02art.htm

                                                                          Gerasimov, Andrei (2007). ‘A Comparison of Translation QA Products’, MultiLingual 18(1): 22.Find this resource:

                                                                            Gómez Palou, Marta (2012). ‘Managing Terminology for Translation Using Translation Environment Tools: Towards a Definition of Best Practices’. Unpublished PhD thesis, University of Ottawa, Ottawa, Ontario. Retrieved from <http://www.ruor.uottawa.ca/fr/bitstream/handle/10393/22837/Gomez_Palou_Allard_Marta_2012_thesis.pdf>.Find this resource:

                                                                              Gornostay, Tatiana (2010). ‘Terminology Management in Real Use’. In Proceedings of the 5th International Conference of Applied Linguistics in Science and Education, 25–26 March, St Petersburg, Russia.Find this resource:

                                                                                Gouadec, Daniel (2007). Translation as a Profession. Amsterdam, The Netherlands/Philadelphia, PA: John Benjamins.Find this resource:

                                                                                  Gough, Joanna (2011). ‘An Empirical Study of Professional Translators’ Attitudes, Use and Awareness of Web 2.0 Technologies, and Implications for the Adoption of Emerging Technologies and Trends’, Linguistica Antverpiensia 10: 195–217.Find this resource:

                                                                                    Gow, Francie (2007). ‘You Must Remember This: The Copyright Conundrum of “Translation Memory” Databases’, Canadian Journal of Law and Technology 6(3): 175–192.Find this resource:

                                                                                      Guerberof, Ana (2009). ‘Productivity and Quality in the Post-editing of Outputs from Translation Memories and Machine Translation’, Localisation Focus 7(1): 11–21.Find this resource:

                                                                                        Gupta, Rohit and Constantin Orăsan (2014). ‘Incorporating Paraphrasing in Translation Memory Matching and Retrieval’. In Proceedings of the Seventeenth Annual Conference of the European Association for Machine Translation (EAMT2014), Dubrovnik, Croatia, 3-10.Find this resource:

                                                                                          Gupta, Rohit, Constantin Orăsan, Qun Liu and Ruslan Mitkov (2016a). ‘A Dynamic Programming Approach to Improving Translation Memory Matching and Retrieval using Paraphrases’. In Text, Speech, and Dialogue. TSD 2016. Lecture Notes in Computer Science, vol 9924, eds. Petr Sojka, Aleš Horák, Ivan Kopeček and Karel PalaP. Heidelberg: Springer, 259-269.Find this resource:

                                                                                            Gupta, Rohit, Constantin Orăsan, Marcos Zampieri, Mihaela Vela, Josef van Genabith and Ruslan Mitkov (2016b). ‘Improving Translation Memory matching and retrieval using paraphrases’. Machine Translation 30:19-40.Find this resource:

                                                                                              Heylen, Kris and Dirk De Hertog (2015). ‘Automatic Term Extraction.’ In Handbook of Terminology, eds. Hendrik J. Koekaert and Frieda Steurs. Amsterdam: John Benjamins, 203-221.Find this resource:

                                                                                                Hutchins, John (1998). ‘The Origins of the Translator’s Workstation’, Machine Translation 13(4): 287–307. Retrieved from <http://www.hutchinsweb.me.uk/MTJ-1998.pdf>.Find this resource:

                                                                                                  Hutchins, John (2005). ‘Current Commercial Machine Translation Systems and Computer-based Translation Tools: System Types and their Uses’, International Journal of Translation 17(1–2): 5–38.Find this resource:

                                                                                                    Jiménez-Crespo, Miguel A. (2013). Translation and the Web. London and New York: Routledge.Find this resource:

                                                                                                      Jiménez-Crespo, Miguel A. (2017). Crowdsourcing and Online Collaborative Translations. Amsterdam: John Benjamins.Find this resource:

                                                                                                        Joscelyne, Andrew and Jaap van der Meer (2007). ‘Translation 2.0: Transmutation’, Multilingual 18(3): 30–31.Find this resource:

                                                                                                          Kenny, Dorothy (2011). ‘Electronic Tools and Resources for Translators’. In Kristen Malmkjaer and Kevin Windle (eds), The Oxford Handbook of Translation Studies, 455–472. Oxford, UK: Oxford University Press.Find this resource:

                                                                                                            Koehn, Philipp (2010). Statistical Machine Translation. Cambridge, UK: Cambridge University Press.Find this resource:

                                                                                                              Koskinen, Kaisa and Minna Ruokonen (2017). ‘Love Letters or Hate Mail? Translators’ technology acceptance in the light of their emotional narratives.’ In Human Issues in Translation Technology, ed. Dorothy Kenny. London: Routledge, 8-24.Find this resource:

                                                                                                                Lagoudaki, Elina (2006). Imperial College London Translation Memories Survey 2006. Translation Memory Systems: Enlightening Users’ Perspective. Retrieved from <http://www.bk.admin.ch/dokumentation/sprachen/04850/05007/05605/index.html?lang=it&download=NHzLpZeg7t,lnp6I0NTU042l2Z6ln1ah2oZn4Z2qZpnO2Yuq2Z6gpJCFdoJ9g2ym162epYbg2c_JjKbNoKSn6A>.

                                                                                                                Lange, C. Andres and Winfield S. Bennett (2000). ‘Combining Machine Translation with Translation Memory at Baan’. In Robert C. Sprung (ed.), Translating into Success, 203–218. Amsterdam, The Netherlands/Philadelphia, PA: John Benjamins.Find this resource:

                                                                                                                  LeBlanc, Matthieu (2013). ‘Translators on Translation Memory (TM). Results of an Ethnographic Study in Three Translation Services and Agencies’, The International Journal of Translation and Interpreting Research 5(2). Retrieved from <http://www.trans-int.org/index.php/transint/article/view/228/134 DOI: ti.105202.2013.a01>.Find this resource:

                                                                                                                    LeBlanc, Matthieu (2017). ‘“I can’t get no satisfaction”: an ethnographic account of translators’ experiences of translation memory and shifting business practices.’ In Human Issues in Translation Technology, ed. Dorothy Kenny. London: Routledge, 45-62.Find this resource:

                                                                                                                      Lemay, Chantal, Marie-Claude L’Homme, and Patrick Drouin. (2005). ‘Two Methods for Extracting “Specific” Single-word Terms from Specialised Corpora: Experimentation and Evaluation’, International Journal of Corpus Linguistics 10(2): 227–255.Find this resource:

                                                                                                                        L’Homme, Marie-Claude (2004). La terminologie: Principes et techniques. Montréal: Presses de l’Université de Montréal.Find this resource:

                                                                                                                          L’Homme, Marie-Claude (2008). Initiation à la traductique, 2nd edition. Brossard, Québec: Linguatech.Find this resource:

                                                                                                                            Lommel, Arle (2004). LISA 2004 Translation Memory Survey: Translation Memory and Translation Memory Standards. Retrieved 1 May 2011 from <http://www.lisa.org/products/survey/2004/tmsurvey.html.

                                                                                                                            Macken, Lieve. (2009). ‘In Search of the Recurrent Units of Translation’, Evaluation of Translation Technology. Special Issue of Linguistica Antverpiensia New Series: Themes in Translation Studies, vol. 8: 195–223.Find this resource:

                                                                                                                              Maia, Belinda (2003). ‘Training Translators in Terminology and Information Retrieval Using Comparable and Parallel Corpora’. In Federico Zanettin, Silvia Bernardini, and Dominic Stewart (eds), Corpora in Translator Education, 43–53. Manchester: St. Jerome Publishing.Find this resource:

                                                                                                                                Marsye, Aurora (2011). ‘Towards a New Generation Translation Memory: A Paraphrase Recognition Study for Translation Memory System Development’. Unpublished MA thesis, University of Wolverhampton, UK.Find this resource:

                                                                                                                                  Melby, Alan K., M. R. Smith, and J. Peterson (1980). ‘ITS: Interactive Translation System’. In Bernard Vauquois (ed.), Coling 80: Proceedings of the Eighth International Conference on Computational Linguistics, Tokyo, Japan, 424–429.Find this resource:

                                                                                                                                    Navigli, Roberto and Simone P. Ponzetto (2012). ‘BabelNet: The Automatic Construction, Evaluation and Application of a Wide-Coverage Multilingual Semantic Network’. Artificial Intelligence 193: 217-250.Find this resource:

                                                                                                                                      O’Hagan, Minako (2009). ‘Computer-aided Translation (CAT)’. In Mona Baker and Gabriela Saldanha (eds), Routledge Encyclopedia of Translation Studies, 48–51. London and New York: Routledge.Find this resource:

                                                                                                                                        O’Hagan, Minako and Carmen Mangiron (2013). Game Localization: Translating for the Global Digital Entertainment Industry. Amsterdam and Philadelphia: John Benjamins.Find this resource:

                                                                                                                                          Olohan, Maeve (2011). ‘Translators and Translation Technology: The Dance of Agency’, Translation Studies 4(3): 342–357.Find this resource:

                                                                                                                                            Pekar, Viktor and Ruslan Mitkov (2007). ‘New Generation Translation Memory: Content-Sensitive Matching’. In Proceedings of the 40th Anniversary Congress of the Swiss Association of Translators, Terminologists and Interpreters. Bern: ASTTI.Find this resource:

                                                                                                                                              Pym, Anthony (2004). The Moving Text: Localisation, Translation and Distribution. Amsterdam and Philadelphia: John Benjamins.Find this resource:

                                                                                                                                                Quah, Chiew Kin (2006). Translation and Technology. Houndmills, UK and New York: Palgrave Macmillan.Find this resource:

                                                                                                                                                  Real Academia Española (n.d.). Database (CORPES XXI) [online]. Corpus del español del siglo XXI.Available online at http://www.rae.es.

                                                                                                                                                  Reinke, Uwe (2013). ‘State of the Art in Translation Memory Technology’, TC3: Translation: Computation, Corpora, Cognition, 3(1): 27–48.Find this resource:

                                                                                                                                                    Sandrelli, Annalisa (2015). ‘Becoming an interpreter: the role of computer technology’. In Insights in interpreting. Status and development / Reflexiones sobre la interpretación. Presente y Futuro, (MONTI special issue 2), eds. Catalina Iliescu and Juan Miguel Ortega Herráez, 111-138.Find this resource:

                                                                                                                                                      Savourel, Yves (2007). ‘CAT Tools and Standards: A Brief Summary’, MultiLingual 18(6): 37.Find this resource:

                                                                                                                                                        Schäfer, Roland and Felix Bildhauer (2012). ‘Building Large Corpora from the Web Using a New Efficient Tool Chain’. In Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Ugur Dogan, Bente Maegaard, Joseph Mariani, Jan Odijk, and Stelios Piperidis (eds), Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12), 486–493. Istanbul: ELRA.Find this resource:

                                                                                                                                                          Schäfer, Roland and Bildhauer, Felix (2013). Web Corpus Construction. Synthesis Lectures on Human Language Technologies. San Francisco: Morgan & Claypool.Find this resource:

                                                                                                                                                            Schäler, Reinhard (2010). ‘Localisation and Translation’. In Yves Gamber and Luc Van Doorslaer (eds), Handbook of Translation Studies, 209–214. Amsterdam and Philadelphia: John Benjamins.Find this resource:

                                                                                                                                                              SDL (2008). ‘Terminology: An End-to-End Perspective’. SDL Research Paper. Retrieved from <http://www.sdl.com/en/language-technology/resources/research_results/terminology-an-end-to-end-perspective.asp>.

                                                                                                                                                              SDL (2009). Trends in automated translation in today’s global business. Retrieved from http://www.sdl.com/en/language-technology/resources/whitepapers/automated-translation-survey-2009.asp

                                                                                                                                                              SDL (2016). SDL Translation Technology Insights. Retrieved from http://www.sdl.com/download/tti16-executive-summary/103386/.

                                                                                                                                                              Somers, Harold (2003a). ‘The Translator’s Workstation’. In Harold Somers (ed.), Computers and Translation: A Translator’s Guide. Amsterdam and Philadelphia: John Benjamins, 13–30.Find this resource:

                                                                                                                                                                Somers, Harold (2003b). ‘Translation Memory Systems’. In Harold Somers (ed.), Computers and Translation: A Translator’s Guide. Amsterdam and Philadelphia: John Benjamins, 31–47..Find this resource:

                                                                                                                                                                  Steurs, Frieda, Ken De Wachter and Evy De Malsche (2015). ‘Terminology Tools.’ In Handbook of Terminology, eds. Hendrik J. Koekaert and Frieda Steurs. Amsterdam: John Benjamins, 222-249.Find this resource:

                                                                                                                                                                    TAUS (2011). Lack of Interoperability Costs the Translation Industry a Fortune. Report on a TAUS/LISA survey on translation interoperability, 25 February. Retrieved from <http://www.translationautomation.com/technology-reviews/lack-of-interoperability-costs-the-translation-industry-a-fortune.html>.

                                                                                                                                                                    Timonera, Katerina and Ruslan Mitkov (2015). Improving Translation Memory Matching through Clause Splitting. Proceedings of the RANLP’2015 workshop ‘Natural Language Processing for Translation Memories’. Hissar, Bulgaria.Find this resource:

                                                                                                                                                                      Torres Domínguez, Ruth (2012). The 2012 use of translation technologies survey. Retrieved from http: http://mozgorilla.com/en/texnologii-en-en/translation-technologies-survey-results/.

                                                                                                                                                                      TradOnline (2011). Translation Industry Survey 2010/2011. Retrieved from http://www.tradonline.fr/medias/docs_tol/translation-survey-2010/page1.html.

                                                                                                                                                                      Wallis, Julian M. S. (2008). ‘Interactive Translation vs. Pre-Translation in TMs: A Pilot Study’, Meta 53(3): 623–629.Find this resource:

                                                                                                                                                                        Zaretskaya, Anna, Gloria Corpas Pastor and Míriam Seghiri (2015). ‘Translators’requirements for translation technologies: a user survey.’ In Corpas Pastor, Gloria, Míriam Seghiri, Rut Gutiérrez Florido and Míriam Urbano Mendaña, eds. (2015). Nuevos horizontes en los Estudios de Traducción e Interpretación (Trabajos completos) / New Horizons in Translation and Interpreting Studies (Full papers). Geneva: Tradulex, 247-254.Find this resource:

                                                                                                                                                                          Zaretskaya, Anna, Gloria Corpas Pastor and Míriam Seghiri (2016). ‘Corpora in computer-assisted translation: a users’ view’. In Corpas Pastor, Gloria and Míriam Seghiri, eds. (2016). Corpus-based Approaches to Translation and Interpreting:from theory to applications. New York/Frankfurt: Peter Lang, 253-276.Find this resource:

                                                                                                                                                                            Zetzsche, Jost (2006). ‘Translation Tools Come Full Circle’, Multilingual 17(1): 41.Find this resource:

                                                                                                                                                                              Zetzsche, Jost (2017). The Translator’s Tool Box. A Computer Primer for Translators. Version 13. Winchester Bay: International Writers’ Group.Find this resource:

                                                                                                                                                                                Notes: