1 / 26

Unicode and Collation Support in Microsoft SQL Server

Michael S. Kaplan Globalization Infrastructure and Font Technology Windows International Microsoft. Unicode and Collation Support in Microsoft SQL Server. Unicode Support. Uses the "N" or national data types from the SQL-92 specification NCHAR, NVARCHAR, NTEXT

Download Presentation

Unicode and Collation Support in Microsoft SQL Server

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Michael S. Kaplan Globalization Infrastructure and Font Technology Windows International Microsoft Unicode and Collation Support in Microsoft SQL Server

  2. Unicode Support • Uses the "N" or national data types from the SQL-92 specification • NCHAR, NVARCHAR, NTEXT • What the SQL-99 spec says about Unicode • Interoperability with other clients Prague, Czech Republic (IUC23)

  3. Collation in SQL Server <= 6.5 • No Unicode support at all • One code page per server • One collation per server • No good solution for multilingual support Prague, Czech Republic (IUC23)

  4. Collation in SQL Server 7.0 • Unicode datatypes supported • Two collations • Unicode • Non-Unicode • Number of collations distilled down to the minimum necessary Prague, Czech Republic (IUC23)

  5. 7.0 flattening of collations • Example: the General Unicode sort order handles: Afrikaans, Albanian, Arabic, Basque, Belarusian, Bulgarian, English, Faeroese, Farsi, Georgian (Traditional), Greek, Hebrew, Hindi, Indonesian, Malay, Russian, Serbian, Swahili, and Urdu Prague, Czech Republic (IUC23)

  6. OS independence • Collation independent of operating system • Based on the Jet “Unicorn” DLLs Prague, Czech Republic (IUC23)

  7. SQL Language Support(limited locale information) • Messages • Date/Time • First Day of Week • Currency and currency symbols • Month/day names and abbreviated month names Prague, Czech Republic (IUC23)

  8. SQL Language Support(list of languages) • Italian • Japanese • Korean • Latvian • Lithuanian • Norwegian • Polish • Portuguese • Romanian • Russian • Slovak • Slovenian • Spanish • Swedish • Thai • Turkish • Arabic • British English • Brazilian • Bulgarian • Simplified Chinese • Traditional Chinese • Croatian • Czech • Danish • Dutch • English • Estonian • Finnish • French • German • Greek • Hungarian Prague, Czech Republic (IUC23)

  9. Getting at the list of languages • sp_helplanguage stored procedure • syslanguages/sysmessages tables • SET LANGUAGE • SET LANGUAGE čeština • SET LANGUAGE 한국어 • Each language has a langid (0 – 32) Prague, Czech Republic (IUC23)

  10. Collation in SQL Server 2000 • Combined code pages and collations into a single entity Prague, Czech Republic (IUC23)

  11. "Windows" collations • Added for unique code pages(Example – Arabic) • Added for unique ordering (Example – French) • Removed for identical ordering(Example – Finnish_Swedish) Prague, Czech Republic (IUC23)

  12. 43 Windows Collations • Albanian • Arabic • Chinese_PRC • Chinese_PRC_Stroke • Chinese_Taiwan_Bopomofo • Chinese_Taiwan_Stroke • Cyrillic_General • Croatian • Czech • Danish_Norwegian • Estonian • Finnish_Swedish • French • Georgian_Modern_sort • German_PhoneBook • Greek • Hebrew • Hindi • Hungarian • Hungarian_Technical • Icelandic • Japanese • Japanese_Unicode • Korean_Wansung • Korean_Wansung_Unicode • Latin1_General • Latvian • Lithuanian • Lithuanian_Classic • FYRO Macedonian • Spanish (Spain) • Polish • Romanian • Slovak • Slovenian • Thai • Traditional_Spanish • Turkish • Ukrainian • Vietnamese   Prague, Czech Republic (IUC23)

  13. Windows collations, continued • Suffix meanings • _BIN (Binary) • _CI/_CS (Case sensitivity) • _AI/_AS (Accent sensitivity) • _KS - kanatype sensitivity (hiragana/katakana) • _WS - width sensitivity (full/half width) Prague, Czech Republic (IUC23)

  14. SQL Collations • Provided for backwards compatibility with prior versions of SQL Server Prague, Czech Republic (IUC23)

  15. SQL Collations • SQL_1xCompat_CP850 • SQL_Estonian_CP1257 • SQL_Latin1_General_Pref_CP437 • SQL_AltDiction_CP1253 • SQL_Hungarian_CP1250 • SQL_Latin1_General_Pref_CP850 • SQL_AltDiction_CP850 • SQL_Icelandic_Pref_CP1 • SQL_Latvian_CP1257 • SQL_AltDiction_Pref_CP850 • SQL_Latin1_General_CP1 • SQL_Lithuanian_CP1257 • SQL_Croatian_CP1250 • SQL_Latin1_General_CP1250 • SQL_MixDiction_CP1253 • SQL_Czech_CP1250 • SQL_Latin1_General_CP1251 • SQL_Polish_CP1250 • SQL_Danish_Pref_CP1 • SQL_Latin1_General_CP1253 • SQL_Romanian_CP1250 • SQL_EBCDIC037_CP1 • SQL_Latin1_General_CP1254 • SQL_Scandinavian_CP850 • SQL_EBCDIC273_CP1 • SQL_Latin1_General_CP1255 • SQL_Scandinavian_Pref_CP850 • SQL_EBCDIC277_CP1 • SQL_Latin1_General_CP1256 • SQL_Slovak_CP1250 • SQL_EBCDIC278_CP1 • SQL_Latin1_General_CP1257 • SQL_Slovenian_CP1250 • SQL_EBCDIC280_CP1 • SQL_Latin1_General_CP437 • SQL_SwedishPhone_Pref_CP1 • SQL_EBCDIC284_CP1 • SQL_Latin1_General_CP850 • SQL_SwedishStd_Pref_CP1 • SQL_EBCDIC285_CP1 • SQL_Latin1_General_Pref_CP1 • SQL_Ukrainian_CP1251 • SQL_AltDiction_CP1253 • SQL_Hungarian_CP1250  • SQL_Latin1_General_Pref_CP850   Prague, Czech Republic (IUC23)

  16. Collation at four levels • Server • Database • Column • Expression Prague, Czech Republic (IUC23)

  17. At the server level • Acts as a default for all databases • Can be changed with RebuildM.exe in the tools\BINN dir • Querying the server collation: SELECT CONVERT(char, SERVERPROPERTY('collation')) Prague, Czech Republic (IUC23)

  18. At the database level • Every database has a collation (default is the server collation) • Collation can be changed under some circumstances Prague, Czech Republic (IUC23)

  19. At the column level • Overrides database level collation • Specifies code page for non-Unicode columns • Again, can be changed under some circumstances • No multilingual columns with separate collations Prague, Czech Republic (IUC23)

  20. At the expression level • Can be used to override any other collation • uses the COLLATE keyword Prague, Czech Republic (IUC23)

  21. Metadata in System Tables • All stored as Unicode no matter what the database collation is • Unicode 2.0 repertoire is used for identifiers (use brackets or quotes around anything else) Prague, Czech Republic (IUC23)

  22. More on the COLLATE keyword COLLATE [<Windows_Collation_name>|<SQL_Collation_Name] • Specific rules of precedence: • Explicit (two explicits == runtime error) • Implicit (two implicits == no collation) • Default • <no collation> Prague, Czech Republic (IUC23)

  23. Limitations • Features people will want for future versions • LCID --> Collation • ISO string <--> Collation • Creating custom collations? Prague, Czech Republic (IUC23)

  24. References • http://microsoft.com/globaldev/ • “International Features in Microsoft SQL Server 2000”(by Michael Kaplan) at http://msdn.microsoft.com/ Prague, Czech Republic (IUC23)

  25. Questions? Prague, Czech Republic (IUC23)

  26. Don’t Forget Your Evaluations! Unicode and Collation Support in Microsoft SQL Server Prague, Czech Republic (IUC23)

More Related