420 likes | 565 Views
Testing multilingual support in Mail User Agents TERENA Pilot Project. Yuri Demchenko, TERENA <demch@terena.nl> TNC’98 Dresden October 5-8, 1998. àç. TERENA Pilot Project on Testing Multilingual MUAs. Officially started in April 1998 till September 1998
E N D
Testing multilingual support in Mail User AgentsTERENA Pilot Project Yuri Demchenko, TERENA <demch@terena.nl>TNC’98 Dresden October 5-8, 1998 ML MUA Testing - TERENA Pilot Project
àç TERENA Pilot Project on Testing Multilingual MUAs • Officially started in April 1998 till September 1998 • The project objectives can be described as: • Develop benchmarking methodology for Multilingual MUAs, and specify templates for collecting the results in a coherent way. • Design a set of composite multilingual test messages • Configure each MUA for all supported national character sets and send the test messages to other MUAs and to themselves. • Compile the results, analyzing how the MUA composes, sends, receives and displays the test messages. • Prepare recommendations for users - correct setup and operation of popular multilingual MUAs ML MUA Testing - TERENA Pilot Project
áóêè The list of mail clients to be tested • Derived from TERENA MUAs usage statistics based on analysis of more than 3000 messages from TERENA Mail archives collected during the period August 1997 - March 1998 • Microsoft Windows (NT, 3.11, 95) • Microsoft Outlook Express • Netscape Mail 3.x and 4.x • Netscape Messenger • Qualcomm Eudora 3.0 and 4.0 beta • Pegasus Mail • The Bat! • ESYS Simeon • Alis Tango Mailer • UNIX Terminal • Elm • MH • Pine • UNIX GUI (with X11R6) • Netscape Mail • EXMH • Z-Mail ML MUA Testing - TERENA Pilot Project
âåäè Activity and Projects in i18n and Multilingual Support • i18n activity (ISO, IETF, ECMA, TERENA, Unicode Consortium) • CEN/TC304 works on European character sets and keyboard • MAITS project • Internet Mail Consortium - Report on using International Characters in Internet Mail • Terena Pilot Project on Testing Multilingual support in MUAs ML MUA Testing - TERENA Pilot Project
ãëàãîë Internet Mail Consortium - i18n Report Summary of recommendations 1. Explicit charset parameter 2. Sending UTF-8 3. Displaying UTF-8 4. Choosing charsets on creation 5. Specifying languages 6. Multi-language text 7. Non-ASCII headers 8. Handling all common charset 9. MTAs and 8-bit content Report strongly recommends that all mail-creating and mail-displaying programs created or revised after January 1, 1999, must be able to create and display mail using UTF-8 and have ability to handle all common charsets in addition to UTF-8 ML MUA Testing - TERENA Pilot Project
äîáðî Standard on i18n and Character Sets Technologies • ISO standards • ISO 2022 Character Set Concept and Terminology • ISO 8859-x Character Sets • ISO Standards on APIs i18n and FDCC • Unicode standards • RFC 2277 IETF Policy on Character Sets and Languages • Recommendation of IAB Workshop on character sets technology (RFC 2130) • MIME format of messages (Using MIME in Internet Mail) RFC 2045-RFC 2049 • RFC 822 - Syntax of electronic messages format according ML MUA Testing - TERENA Pilot Project
åñòü Standards in i18n and Multilingual Support in Internet Mail • RFC 2045 - RFC 2049, RFC 2231 - MIME • Coded Character Set • Character Encoding Scheme specified by the Charset parameter to the Content-Type header field • Transfer Encoding Syntax like Base64, QP specified by the Content-Transfer-Encoding header field • RFC 2277 - IETF Policy on Character Sets and Languages • main definitions and requirement for language tagging • RFC 2130 - Recommendation of IAB Workshop on character sets technology • framework for interoperability between the many characters in use • an architecture model for on-the-wire transmission of text • recommendations for tagging transmitted (and stored) text ML MUA Testing - TERENA Pilot Project
æèâåòå RFC 2130 Architecture model • User interface issues (OS, GUI, API) • Layout • Culture • Locale • Language • On-the-wire • The Coded Character • The Character Encoding Scheme • The Transfer Encoding Syntax ML MUA Testing - TERENA Pilot Project
çåëî The testing and the evaluation scheme ML MUA Testing - TERENA Pilot Project
çåìëÿ Testing of Multilingual support in MUAs • Includes the following phases: • Evaluation of Multilingual features/settings of MUAs • Testing Message Reading procedure • Testing Message Composing procedure • Testing Message Sending and Receiving procedure ML MUA Testing - TERENA Pilot Project
èæå Evaluation of Multilingual features/settings of MUAs • READ operation mode • choose Language/Encoding • choose Fonts (Optional for Address, Subject, Message Body, Quoted Text) • Optional - Font mapping • COMPOSE operation mode • choose Language/Encoding Settings • Optional - Possibility to switch Language/Encoding during composition/typing • choose Fonts (Optional for Address, Subject, Message Body, Quoted Text) • Optional - choose Spelling/Language/Dictionary • SEND operation mode • set MIME encoding (Quoted Printable, Base64) • Optional - select/disable Uuencode mode (non standard) • Allow/disallow 8-bit in Header Fields • select/disable HTML in body parts ML MUA Testing - TERENA Pilot Project
è Message Reading procedure • Multilingual MUAs should support the following features: • Reading/Displaying non-ASCII characters in Message Body • Reading/Displaying non-ASCII characters in Message Header (Address, Subject Lines) • Reading Forwarded Message with non-ASCII characters in Address, Subject, Message Body, using the same or different MIME character set attributes • Reading Attached non-ASCII Text File (Document) • Possible problems are detected comparing the original and the delivered test messages appearance • This includes the evaluation of the MUAs correct/incorrect processing of the MIME attributes of the test message. ML MUA Testing - TERENA Pilot Project
êàêî Message Composing procedure • Message composition operations to be tested • Typing message from keyboard • Copy and Paste operations • Text/File attachments • Quoted text/message • Edit different parts of message • Charset/Encoding processing by Message Composer/Editor • Real Message composition also includes operations like: • Typing non-ASCII text in Message Body and Message Header • Pasting non-ASCII-Text into Body and Header fields • Reply to message with non-ASCII Text • Forward message with non-ASCII content • Attach text documents containing non-ASCII characters ML MUA Testing - TERENA Pilot Project
ëþäè Test messages set Each test is performed in at least 2 character sets, one of which is US ASCII (or ISO 8859-1), and the other with characters that are not part of US-ASCII or ISO 8859-1. • Mandatory • tmsg1 - Message with non-ASCII characters/text in the Subject line • tmsg2 - Message with non-ASCII characters/text in Mail Address free-form name • tmsg3 - Message with non-ASCII characters/text in the Message Body text (single part) • tmsg4 - Message with non-ASCII characters/text in text/plain attachment • Optionally • tmsg6* - Message with UTF-7/UTF-8 Character set in Message Body and Header (optional) ML MUA Testing - TERENA Pilot Project
ìûñëåòå Testing program map ML MUA Testing - TERENA Pilot Project
íàø Testing Methodology - The tests to be performed • test-1 - Receive all 4 test messages tmsg1-tmsg4 and display them correctly (Change Language/Alphabet/EncodingOptions if needed) • test-2 - Print all 4 messages tmsg1-tmsg4 to the standard printer • test-3 - Reply to messages tmsg1 and tmsg2, and check that information is returned in the same character set as it arrived in • test-4 - Reply to message tmsg3 using "reply including quote of body" • test-5 - Reply to message tmsg3 using the environment's "cut and paste" function to insert the non-ASCII characters into the outgoing message • test-6 - Forward all 4 messages to the originator address • test-7 - Generate, as completely as possible, the same messages from the keyboard of the IUT • test-8* - Check possible text distortion when exchanging by tmsg1-2-3 with non-ASCII Default Language/Alphabet/Encoding • test-9* - Provide tests 1-5 for message tmsg6* withUTF-7/UTF-8 ML MUA Testing - TERENA Pilot Project
îí Testing Results Presentation ML MUA Testing - TERENA Pilot Project
ïîêîé ML MUAs Testing Results and Data Analysis • Testing results are documented and presented at • http://park.kiev.ua/multiling/ml-mua/prjdocs/mlmua-repv1.html • Standards overview on Internationalisation and Multilinguality • http://park.kiev.ua/multiling/ml-mua/mldoc-review.html • Test messages constructor pilot version • http://park.kiev.ua/multiling/ml-mua/testcon.html ML MUA Testing - TERENA Pilot Project
ðöû Evaluation of ML MUAs • First group - includes MUAs that support multiple languages/alphabets by means of multiple charsets support and use internal language/charset transformation • Microsoft Outlook Express • Netscape Messenger 4.04 and previous product Netscape Mail 3 • exmh for X Windows • Second group - provides ML support by selecting proper font for creating and displaying messages • Eudora Pro 3.0 • Pegasus • Forte Agent • The Bat! • Simeon • UNIX Terminal Products • pine • elm ML MUA Testing - TERENA Pilot Project
First group - Full Multilingual Support ñëîâî • Microsoft Outlook Express • has the best and richest multilingual support • use effective internal conversion scheme that is good controlled by users via setup and Alphabet/Charset selection menu • Netscape Messenger 4.04 and Netscape Mail 3.04 • provide rich multilingual support for many charsets/encodings • but are very inflexible for Languages that have many charsets in use (F.E., Cyrillic Windows CP-1251 and KOI8-R/U for Russian/Ukrainian, or ISO 8859-2 and Windows CP-1250 for Central European Languages • Netscape products for X Windows - the same features. • exmh for X Windows • provides good support for main groups of European languages using Latin 1, Latin 2 Cyrillic charsets ML MUA Testing - TERENA Pilot Project
Second group – Simplified Multilingual Support òâåðäî • Popular in Latin1 (ISO 8859-1) and English speaking community • Languages and charsets/encodings support is provided by selecting proper font for creating and displaying messages. • Eudora Pro 3.0 • Pegasus • Forte Agent • The Bat! – provide simple conversion between Cyrillic encodings (ISO 8859-5, Windows CP-1251, KOI8-R) • Simeon • pine and elm for UNIX ML MUA Testing - TERENA Pilot Project
Common problems of multilingual support in MUAs óê • Conversion between different Encodings/Charsets for the same language • Correct processing of MIME tags in message Header fields (Subject and Address lines) during displaying when charset name in header is different from Message Body • The same problems occur when user tries to change Charset/Encoding when displaying or composing message, or use Copy&Paste operations for different Charsets • View message source code and/or message info (charset/encoding for the Header and Body, Multipart MIME structure, so on) • Using common and correct terminology for language/charset settings in MUAs ML MUA Testing - TERENA Pilot Project
Project’s Main Results ôåðòü • The international environment of the project allowed to discover the main problems in multilingual MUAs support • Multilingual test messages set • Evaluation scheme for the forthcoming ML MUAs • Project activity was conducted in coordination with other multilingual related projects: • IMC MAIL-I18N report on Internationalization and Character Set technologies • Mozilla i18n project (Netscape 5.0) • PT members have contributed to the new Ukrainian Language enabled Mozilla • proposed model of multilingual support in MUAs was discussed • ESYS Simeon IMAP Mail multilingual features testing ML MUA Testing - TERENA Pilot Project
Follow-on Projects and activity õåð • Testing new products using proposed methodology • New releases of OutLook Express 98, Netscape Messenger 4.5 and 5.0 • New products of 1999 that is expected will implement recommendations of IETF/IMC • Another areas of further activity • Establishing ML/i18n supporting Charsets repository for online support of Multilingual mail (mapping reference tables download, translation, configuration, etc.) • Creating Web based ML test messages Constructor which pilot version is demonstrated at project’s page • http://park.kiev.ua/multiling/ml-mua/testcon.html ML MUA Testing - TERENA Pilot Project
îò Test Messages Constructorhttp://park.kiev.ua/multiling/ml-mua/testcon.html ML MUA Testing - TERENA Pilot Project
öû Test Messages Constructor - Creating test message ML MUA Testing - TERENA Pilot Project
÷åðâü Project Team Yuri Demchenko, TERENA Konstantin Chuguev, Ural Technical University, Russia Janja Faganel, Jozef Stefan Institute, Slovenia Vadim Shevchenko, Kiev Polytechnic Institute Alexey Medvedev, Kiev Polytechnic Institute ML MUA Testing - TERENA Pilot Project
Acknowledgments øòà • Borka Jerman-Blazic, Jozef Stefan Institute, Slovenia • Claudio Allocchio, Sincrotrone Trieste & INFN Trieste, Italy • Peter Heijmens Visser from TERENA for provided MUAs usage statistics • Harald T. Alvestrand, Maxware Norway ML MUA Testing - TERENA Pilot Project
IMPORTANT NOTE åð Multilingual page will be moved and supported at TERENA webserver http://www.terena.nl/multiling/ ML MUA Testing - TERENA Pilot Project
åðû ML MUA Testing - TERENA Pilot Project
åðü ML MUA Testing - TERENA Pilot Project
ÿòü ML MUA Testing - TERENA Pilot Project
þ ML MUA Testing - TERENA Pilot Project
èà ML MUA Testing - TERENA Pilot Project
þñ ìàëûé ML MUA Testing - TERENA Pilot Project
þñáîëüøîé ML MUA Testing - TERENA Pilot Project
êñè ML MUA Testing - TERENA Pilot Project
ïñè ML MUA Testing - TERENA Pilot Project
Russian/Ukrainian LanguagesHistorical overview ôèòà • VI-XI cent. - Ancient Rus written language • X-XIV cent. - Cyrillic written language • Invented by Cyrill and Methody (Saloniki) in IX cent • First introduced in Moravia with advent of Christianity • Introduced in Kiev Rus with advent of Christianity in X cent. • XIV-XVII - Forming Russian literature language • With Forming Moscow State after Mongol higo • XVII - Developing modern Russian literature language • Lomonosov, Puskin ML MUA Testing - TERENA Pilot Project
Ukrainian Literature Language èæèöà • Common ancient roots with Russian and all Slavic languages • Was influenced by centuries of conquerors’ languages • features of analytical language (as English) • 1818 - Published Gramatics of Ukrainian (malorussian) dialect • introduced “ukr. i”, “¥´” (for “kg” sounds), spelling of “äç”, “äæ” • Forming modern Ukrainian literature language (Taras Shevchenko) • 1921 - Published “Main rules of Ukrainian orthography” • 1984 - introduction of new/lost ukr. letter “¥´” ML MUA Testing - TERENA Pilot Project