1 / 54

Internationalization Localization & Unicode

Karunesh Arora Vijay Gugnani C-DAC Noida. Internationalization Localization & Unicode. “Everyone has the right... to seek, receive and impart information and ideas through any media regardless of frontiers” -- Universal Declaration of Human Rights.

Download Presentation

Internationalization Localization & Unicode

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Karunesh Arora Vijay Gugnani C-DAC Noida Internationalization Localization & Unicode

  2. “Everyone has the right... to seek, receive and impart information and ideas through any media regardless of frontiers”-- Universal Declaration of Human Rights

  3. Internationalization, which is often referred as i18n, depicts the practice of designing and developing a application, product or document in a way that makes it easily localizable for target audiences that vary in culture, region, or language. Internationalization

  4. To remove barriers to local and international access Adaptation to local, regional, linguistic or cultural needs. To provide global reach ROI, Revenue generation Why Internationalization?

  5. Localization is the actual adaptation to meet the language, cultural, and other requirements for specific target audience. While internationalization gives us the technology and tools to target a given audience, it’s the act of localization that makes it accessible. Internationalization Vs. Localization

  6. Localization is much more than translation. Specifically, localization refers to adaptation to other language, which involves appropriate: Language Translation Locale transformation and Cultural aspects What goes with localization?

  7. Most languages are used in many countries, not just those where they are dominant or “official” People migrate and take languages with them Over enough time, most languages evolve differently in different locations Language Translation Languages and Countries

  8. Language Translation: A “script” may be defined as collection of related characters It is common for several languages to share most, but not all characters from a given script Scripts are often given the same name as one of the languages that uses them Arabic script, but Arabic, Farsi, Urdu,… languages Scripts are also given common name for a group of languages Devanagri script for Hindi, Marathi, Nepali, Konkani etc. Scripts and Languages

  9. Language Translation Identify ‘Translatable’ and ‘Non-translatable’ strings Gender and number agreement, ordering of segments in a sentence e.g. Page number -> e.g. Number of pages -> Many languages can take at least 30% more space Tool – उपकरण (HI) & ग्राहक - customer (EN) Design should be compatible, or else the UI may have to be redesigned Narrow columns often cannot accommodate long Target language equivalent words Some Points to consider:

  10. Language Translation Avoid ambiguous phrases ‘Display options’ Options of the display -- as Noun Noun Show the options (all of them) – as Verb Noun Proverbs and metaphors may not have equivalents in target language Keep Web pages and paragraphs short. Avoid text in graphics. Use simple grammatical structures. Use everyday language. Provide clues. Some Points to consider… Contd.:

  11. Language Translation Follow source language conventions. Avoid acronyms. Abbreviations may have to be expanded when translated Check spelling and grammar. The more compact the source writing, the longer the Translation Brief translators about the purpose and target audience All items in a menu or set of check boxes should have the same grammatical structure Some Points to consider… Contd.:

  12. Locale Set of parameters that define the user’s language, country and cultural preferences

  13. Different aspects of locale Names & Titles Calendars, Numeric, Date and Time formats, Addresses, Currencies, Paper size, Weights & measures Input Mechanism, Language Selection, Oral Pronunciation

  14. Titles and Names In India, it is required to specify etc.) these titles do not necessarily translate Family name is not always last (In South & West part of country) Sorting can be based on last name or first Salutations in letters (e.g. Dear) are different in different locales e.g.

  15. Titles and Names Source: Delhi Press Prakashan

  16. Calendars The Gregorian calendar should not always be assumed Proper localization of some software requires the use (at least as an option) of calendars distinct to a culture E.g. Vikram Samvat/ Saka / Hijri calendar in India Calendars of various religions where year 0 was not 2006 years ago Fiscal-year based calendars vary widely Some have 13 months (364/28) or 53 weeks

  17. Date formats Date separators depend on locale ‘/’, ‘-’, ‘.’ ‘am’ and ‘pm’ are not used universally (many cultures use 24 hour clock) ISO standard dates are unambiguous yyyy-mm-dd hh:mm:ss Non ISO date 01-03-02 means different things in different locales. If not using ISO, then display dates in the locale of the user Preferably use a ‘long’ form with the month spelled out (in the correct language)

  18. Formatting Numbers locale dependent, not the language of application Group separation Number of digits in a group In English and ISO it is 3 while for Indic languages its different 1,23,456 i.e. ##,##,##,### Group separator In English ‘,’, but ISO uses space, and some locales use ‘.’ or none Decimal separator ‘.’, ‘.’, ‘,’ Negative symbol ‘-’, ‘~’, ‘(…)’

  19. Currency Use the currency symbol of the data i.e. INR doesn’t automatically translate to £ or $ when the locale changes Format depends on the user’s locale, not the currency Differences in formats: Symbol Position (before or after the currency) Blanks separating the symbol from the data

  20. Currency contd… Different ways of expressing Rs. 1000 • Rs.1000 OR Rs. 1000/- or Rs.1,000/- or Rs. 1000.00 • INR 1000 • 1000 Rupees 1000 रुपये Strong currencies like Indian need decimal precision (e.g. 2 digits after the decimal point for paisa)

  21. Language selection Avoid using national flags to choose preferred language Multiple countries use the same language Display of language selection order? Language of displaying languages ? In the language itself, or with a translation in the default language of the operating system

  22. Pronunciation Important for Speech based systems Higher recognition accuracy can be obtained by tailoring voice input to regional dialects Voice output in the wrong dialect can make an application sound ‘foreign’ Applications supported with regional dialects have better impact

  23. Culture Culture is a complex collection of experiences which condition daily life; It includes history, social structure, geographical effects, religion, traditional customs and everyday usage.

  24. Icons, symbols and images Colors, myths, beliefs and feelings Humour Geographical & environmental effects Customs & traditions Social Security Numbers Cultural issues

  25. Icons & Symbols Icons that are a play on words do not translate e.g. A dust bin for dumping files A rocket for launching an application A scissors for cutting in edit operation “B”, “I”, “U” Some concepts have been found extremely hard to represent as an icon E.g. Sorting (‘A->Z’ is not universal) Images of people or body parts such as hands Considered inappropriate in some cultures What skin color do you use? People Images need to be localized for each country

  26. Colors & Humour The color white may represent purity and green prosperity in the Indian context, but it may not be the same in another culture. Humour generally does not get translated People are sensitive to different things in different cultures Jokes/cartoons can be offensive

  27. Customs & Traditions In the Indian culture, people show respect to their elders and renowned personalities by addressing them in plural. e.g. Dr. Manmohan Singh is the prime minister of India. डॉ. मनमोहन सिंह भारत के प्रधानमंत्री हैं। Similarly, in social relationships, there are several words to address a relation e.g. for ‘uncle’ - चाचा, ताऊ, मौसा

  28. Unicode? Unicode provides a unique number for every character, no matter what the platform,no matter what the program,no matter what the language. Source: http://unicode.org

  29. Unique number for every character Universal Character Encoding …

  30. 96 thousand characters, so far All characters accessible at the same time, in the same document: क, க, ಔ,… Unifies all Languages

  31. Developed & supported by industry leaders: Apple, HP, IBM, JustSystem, Microsoft, Oracle, SAP, Sun, Sybase, Unisys, … Supported in standards: XML, HTML, Java, ECMAScript (JavaScript), LDAP, CORBA 3.0, WML, Perl, etc. Implemented in: All modern operating systems, browsers, and other products Wide Spread Support

  32. http://भाषा.in IDN

  33. www.unicode.org Online Standard Technical Reports FAQs General Information Discussion Forums, Conferences Information about Unicode

  34. System APIs: Windows, Java, Unix, Oracle, DB2, Sybase, Mac, Linux, … Languages Java, JavaScript, C#, Perl 5.6.0, C, C++, SQL, … Cross-platform libraries: ICU, Rosette, … Resources Availability

  35. ISCII the basis for characters and allocation DIT is member of Consortium Reports have been submitted on missing characters, clarifications or corrections of usage Indic Support in Unicode

  36. Within script, layout and contents nearly identical Independent + dependent vowels Halant model for representing conjuncts conjuncts / half-forms not directly encoded represented by sequences instead Phonetic sequence – order in syllables ISCII : Similarities

  37. Unicode is stateless: No shifting to get different scripts Each character has a unique number Unicode is uniform: No extension bytes necessary All characters coded in the same space ISCII : Differences

  38. Accessible Information across the globe Seamless multilingual documents Opens up software export market, beyond English Connects India to the world Advantages

  39. The world is moving rapidly to Unicode Unicode makes India open to the world The world comes to you, and You go to the world The Future

  40. UTF-8: maximal compatibility with 8-bit systems UTF-16: good storage, interoperability with Windows/Java UTF-32: simplest processing Fast, lossless conversion Multiple Forms

  41. W3C Internationalization Activity

  42. Presentation / Styling issues Styling of first character If some styling feature is to be applied to the starting character, then whether it will be applied to a single character, conjunct character, a syllable or a Grapheme cluster. e.g. Some Issues under discussion in IL स्थिति (Position) प्रस्थान (Departure) स्वर (Vowel) कोश (Dictionary) हिंदी (Hindi) हिन्दी (Hindi) क्षेत्रीय(Regional)

  43. Presentation / Styling issues Styling of first character Some Issues under discussion in IL

  44. Presentation / Styling issues In Cursive Text like Arabic and Urdu the styling is applied to whole word Some Issues under discussion in IL Saabiq -> Former Urdu Source: Rashtriya Sahara

  45. Presentation / Styling issues Vertical arrangement of characters If some string is written in vertical mode, then writing each character on a new line may not be suitable Some Issues under discussion in IL http://www.w3.org/International/notes/firstletter.html

  46. Presentation / Styling issues Horizontal spacing e.g. Some Issues under discussion in IL

  47. Presentation / Styling issues Bullets and numbers Number schemes to be supported in Indian languages also. Some Issues under discussion in IL

  48. Presentation / Styling issues Collation A means to search and order data in a way that makes sense in their particular culture Myths - One collation is good enough Unicode enabled – sorting is already covered Some Issues under discussion in IL

  49. Some Issues in Indian Languages Presentation / Styling issues

  50. Presentation issues Underlining of the characters अन्य भाषाओं में भी अनुवाद Some Issues under discussion in IL

More Related