Localization and translation Reinhard Schler
1. Perspectives 2. Localization: More than just interlingual translation o 2.1 Characteristics o 2.2 Internationalization and reuse: Prerequisites for on-time localization o 2.3 Generic enterprise localization process Analysis Preparation Translation Engineering and testing Review 3. The future of localization and translation References
Handbook of Translation Studies 2010 Revised 2011 2011 John Benjamins Publishing Company. Not to be reproduced in any form without written permission from the publisher.
1. Perspectives
Localization is the linguistic and cultural adaptation of digital content to the requirements and the locale of a foreign market; it includes the provision of services and technologies for the management of multilingualism across the digital global information flow. Thus, localization activities include translation (of digital material as diverse as user assistance, websites and videogames) and a wide range of additional activities. Contrary to definitions provided by the Localization Industry Standards Association, LISA (2010), or Dunne (2006), this definition explicitly focuses on digital content and includes the management of multilingualism as one of the important localization activities. The localization industry as it is known today emerged in the mid 1980s with the advent of personal computing. North American multinational software publishers were scouting for new markets for products that had already been proven highly successful in the USA. They identified these new markets in Europe, concentrating their efforts initially on the richest countries in the region: France, Italy, Germany and Spain the so-called FIGS countries. The localization service industry subsequently organised itself into Single Language Vendors (SLVs) and Multi Language Vendors (MLVs). In the mid 1990s, a dedicated localization tools industry emerged. Following a continued period of growth, Beninatto and Kelly (2009) estimate the language services market
worldwide to be worth US$25 billion by 2013. Many digital publishers, including companies such as Microsoft and Oracle, now generate more than 60% of their overall revenues from their international business divisions. Localization is an instrument for the unlocking of global market opportunities for these companies and an instrument of their globalization efforts. It is, therefore, not surprising that their localization decision is never based on the number of speakers of a particular language, but on the Gross National Product (GNP) of the market they target. While publishers localize their digital content into Danish (5m speakers approx.) they do not so for Amharic (17m speaker approx.) and rarely if ever for Bengali (100m speakers approx.). Translators working in the localization industry are among the most innovative in their profession. In the early 1990s, they were the first to use computer assisted translation tools (see Computer-aided translation) for large-scale projects as both, the characteristics of the material to be translated (very repetitive, large volumes, often of a technical nature) and the environment in which it was translated (highly computerised, experimenting with new technologies as they emerged), were highly conducive for the progressive introduction of advanced technologies such as electronic terminology databases and translation memories. In more recent years, Central Europe, China and India have become the central hubs for the world wide localization industry mainly because of the lower cost of employment in these regions (Niode 2009). It can reasonably be expected that India and China will become more than just cheap localization hubs for large foreign multinationals; they will very soon become major publishers of digital content in their own right. According to a report by Barboza (2008) for the New York Times, China surpassed the USA in internet use. With a penetration rate of under 20%, the number of Chinese internet users was with 253 million already bigger than that of the USA which had already reached saturation point (with 70%). This development will soon lead to fundamental changes in the localization industry, which today still works with English as the default source language.
2. Localization: More than just interlingual translation
In an attempt to make the concept more accessible to the lay person, localization is often defined as like translation, but more than that. As translation technologies and digital content have become almost ubiquitous, the difference between translation and localization has become clouded and somewhat difficult to define.
2.1 Characteristics
Todays localization projects are far from being homogeneous. They can deal with anything from relatively static, large-scale enterprise applications such as database systems and applications, to rapidly changing web-based content such as customer support information and relatively small size but very frequent, ad hoc personal and perishable consumer-type content. A typical enterprise localization project, for example, can involve the translation of three million words, stored in 10,000 files to be translated into up to one hundred languages, all to be made available within a very short period of time (Schler 2004). Content is often multimodal, it can come as text, graphics, audio, or video, and can be stored in a large variety of file formats. Content can be highly repetitive and is often leveraged from previous versions of the same core product. As digital publishers struggle with the ever increasing demand on their capacities, they focus on standards, interoperability and process improvements, introducing sophisticated translation management systems (TMS). They also resort to internationalization and reuse of previously translated material to achieve the required increase in efficiencies.
2.2 Internationalization and reuse: Prerequisites for on-time localization
Publishers approached localization often as an afterthought. Deltas, i.e., the time period between the release of the original version of the software and that of its localized version, of nine months were the norm. As the type of digital content published changed (from applications to multimedia to web content) so did its distribution to consumers and, subsequently, the demands for on-time localization: customers now demand this content become available in their own language without delay. The two developments that made on-time localization or simship, the simultaneous shipment (release) of digital content, in a number of different languages and locales possible for the first time in the early 1990s were internationalization and the re-use of previously localized material. Internationalization, meaning the preparation of digital content for use in different languages as well as for easy localization, dramatically reduced the localization effort which publishers ideally wanted to reduce to translation, eliminating as much as possible costly software re-engineering, re-building and testing activities. Digital publishers had learned the hard way about the high cost of localization as an afterthought, so the most advanced of them decided to take localization upstream, closer to the design and development teams, starting with a smart localization-friendly design and development of that content. Typical localization issues, such as the restricted or inappropriate encoding of characters, hard-coded strings or concatenated strings, or ill-advised programmatic dependencies on specified strings such as the infamous Y in many a softwares message Press Y to continue could thus be
eliminated, not just for one but for all language versions of that product and ahead of localization. Reuse of previous translations became the main strategy to cut down on translation cost and time. Repetition processing, both within one single version as well as across versions of the same core content, started in the early 1990s when translation memory technologies were first introduced to large-scale enterprise localization projects (Schler 1994). In some projects, reuse rates of 60% and higher can now be achieved, significantly cutting down on translation cost and time.
2.3 Generic enterprise localization process
While each localization project represents its own, particular challenges requiring a fine tuning of the localization process to be adapted, most processes have core aspects in common.
Analysis
Prior to localization, a number of key questions need to be answered in relation to the project on hand: Can the digital content be localized? Some digital content is so specific to its original market that localization would require significant re-development that would make it financially not viable. Is the content internationalized? Some digital content does not support the features of other language and writing systems. Is the content to be localized accessible? If localizable strings are hard-coded, i.e., embedded in the original code or in an image, they cannot be accessed by standard localization tools. It is standard practice as part of the analysis to carry out a so-called pseudo translation, i.e., the automatic replacement of strings within digital content with strings containing characters of the target language. Pseudo translation can demonstrate in an easy, lowcost way the effect localization will have on the digital content in hand. The outcome of this phase is a report summarizing the results of the analysis and containing recommendations to the project teams on how to proceed.
Preparation
Following the successful completion of the analysis phase, project managers, engineers and language leads prepare the localization kit for translators and engineers containing all the original source material, reference material such as terminology databases, translation memories, style guides, and test scripts, as well as a task outline, milestones, and financial plans. The localization kit includes a description of all the deliverables, the responsibilities of the stakeholders, and all contact details.
Translation
While translation is at the centre of this activity, not all of the translation is necessarily done by translators. Some, or indeed all of it can be delivered (semi-) automatically by sophisticated computer aided translation technologies, including terminology database, translation memory (TM), and machine translation (MT) systems. In cases where all of the source material is pre-translated using, for example, a hybrid automated translation system, it is not translation but post-editing that is required. Translators also need to support computer assisted translation tools and their associated language resources involving the maintenance of large size and multiple terminology databases and TMs across products, versions and clients, and the tuning and use of MT systems. While some platforms and localization tools provide a visual translation environment allowing translators to see the context and appearance of the strings that are being translated, this is not always the case. Strings might have to be translated out of context. Combined with a significant pressure to produce high-quality translations within short time frames, this is a very stressful, alienated, highly automated and technical translation environment for which specialised training is required (Schler 2007).
Engineering and testing
Following translation, digital content must always be re-assembled and tested (or quality assured) for functionality, layout and linguistic correctness. While properly internationalized digital content significantly helps to cut down on the engineering and testing (QA) effort necessary, translation can have an unexpected effect on the functionality and appearance of the content (Jimnez-Crespo 2009). Even strings that have been translated correctly can be corrupted when used by an application or a browser for reasons not always apparent to translators, localization engineers and testers, and can require significant efforts to be rectified before the final product can be released.
Review
Following each localization project, a thorough review is conducted by the localization teams involving both the client and the vendor site. The aim of this review is to reinforce successful strategies and to avoid mistakes when dealing with similar projects in the future.
3. The future of localization and translation
Discussions about localization and translation have for a long time orbited around a rather predictable set of issues with the role of technology, automation, standards, interoperability and efficiencies in translation and localization featuring prominently (Genabith 2009). This is so because the discussion about as well as the research into localization-related issues has been dominated by the pragmatic, commercial agenda of the localization industry, an industry driven almost exclusively by the desire to maximise the short-term financial return on investment of multinational digital publishers in the development of their digital content. This rather narrow focus of current mainstream localization activities is beginning to expand. This development is driven by people and organisations who have recognised that localization and translation are important not just for commercial, but also for social, cultural and political reasons; they can keep people out of prison, enhance their standards of living, improve their health and, in extreme cases, even save their lives. A recent, though rather short-lived, example of such activity was the reaction to the Haiti disaster in early 2010 when a large number of localization service providers as well as an even larger number of individuals volunteered their services to help the people of Haiti. The reaction to this catastrophe drove truly innovative efforts in disaster relief involving translation and localization, such as the 4636 multilingual emergency text service reported by Ushahidi and Envisiongood. Still, there is a clear urgency to explore more sustainable and long term alternatives to current mainstream localization and translations, going beyond those that react in an immediate and often uncoordinated and unsustainable way to disasters. Access to information and knowledge in your language using media such as the world wide web is not a nice to have anymore, not an option; it is a human right and should be recognised as such as De Varennes (2001) points out. Initiatives to make localization and translation technologies and services available to all, including to those who currently do not have access to them because of geographical, social or financial reasons, have shown very promising results. One of the most prominent examples is that of the IDRC, the Canadian Governments Development agency which has been funding
both the South East Asian (IDRC 2003) and the African (IDRC 2008) networks for localization. Another is the more recent The Rosetta Foundation. Perhaps it is not surprising and should have been expected that the hottest and most promising topics in the current localization debate crowdsourcing, collaborative translation and wikifization are again about to be taken over by industry interests rather than by those of society, at a time when they could start to support the educational, health, justice, and financial information requirements of those most in need.
References
Barboza, David. 2008. China Surpasses U.S. in Number of Internet Users. The New York Times. 26 July 2008. http://www.nytimes.com/2008/07/26/business/worldbusiness/26internet.html [Accessed 27 April 2010]. Beninatto, Renato S. & Kelly. N. 2009. Ranking of Top 30 Language Services Companies. http://www.commonsenseadvisory.com/Research/All_Users/090513_QT_2009_top_30 _lsps/tabid/1692/Default.aspx?zoom_highlight=ranking [Accessed 27 April 2010]. De Varennes F. 2001. Language Rights as an Integral Part of Human Rights. IJMS: International Journal on Multicultural Societies. 3 (1): 1525. http://unesdoc.unesco.org/images/0014/001437/143789m.pdf#143762 [Accessed 10 May 2010]. Dunne, Keiran J. 2006. Putting the Cart Behind the Horse Rethinking Localization Quality Management. In Perspectives on Localization, Keiran J. Dunne (ed.), 95117. Amsterdam & Philadelphia: John Benjamins. TSB Genabith, Josef van. 2009. Next Generation Localisation. In Localisation Focus The International Journal of Localisation 8 (1): 410. http://www.localisation.ie/resources/locfocus/vol8issue1.htm [Accessed 6 May 2010]. IDRC. 2003. PAN Localization: Building Local Language Computing Capacity in Asia. http://www.idrc.ca/panasia/ev-51828201-1-DO_TOPIC.html [Accessed 27 April 2010]. IDRC. 2008. African Network for Localisation (Anloc). http://www.idrc.ca/acacia/ev122243201-1-DO_TOPIC.html [Accessed 27 April 2010]. Jimnez-Crespo M.A. 2009. The evaluation of pragmatic and functionalist aspects in localization: towards a holistic approach to Quality Assurance. In The Journal of Internationalisation and Localisation (IJIAL) 1: 6093.
LISA. 2010. Localization. http://www.lisa.org/Localization.61.0.html [Accessed 27 April 2010]. Niode, Pricilla. 2009. Assessing the Southeast Asian Markets. In Multilingual Computing. September 2009: 4952. Schler, Reinhard. 1994. A Practical Evaluation of an Integrated Translation Tool during a Large Scale Localisation Project. In Proceedings of the 4th Conference on Applied Natural Language Processing (ANLP-94). Stuttgart, Germany (October 13 15). TSB Schler, Reinhard. 2004. Language Resources and Localisation. In Proceedings of the II International Workshop on Language Resources for Translation Work, Research and Training. A satellite event of COLING (28 August 2004). http://www.mtarchive.info/Coling-2004-Schaler.pdf [Accessed 27 April 2010]. Schler, Reinhard. 2007. Translators and Localization. In The Interpreter and Translator Trainer 1: 119135. TSB