Helaas...deze pagina is niet beschikbaar in het Nederlands. Klik hier om terug te gaan of lees verder in het Engels.

Converting non-Latin characters to Unicode in SQL databases

Area: SQL Server 2005 | Level: Intermediate

I've been working on an interesting task recently for a client. My customer has a logistics application with a SQL 2000 backend and is upgrading to a new version of the application. The new application version runs on SQL Server 2005 and is Unicode-compliant, which is the key to this tale.

The (Dutch) customer has a subsidiary in Poland that uses the logistics application with a remote connection, so the data is accessed from Poland but is physically stored on a database server in Holland. The present database is non-Unicode compliant, so all strings are stored in char and varchar data type fields. The Dutch server is configured with a "normal" Western codepage 1252 and is able to support all characters for English-speaking and Western European subsidiaries, but for Poland the system breaks down a bit.

The Polish alphabet is a superset of the Latin alphabet and has 32 characters, so if a Polish user on a Polish-configured code page 1250 Windows workstation enters "WARSZAWA" in an address field that's what gets saved in the database and displayed at all locations and in all circumstances.

If, however, she types in "ŁÓDŹ" or "KĄTY WROCŁAWSKIE" what gets saved in the Dutch database is "£ÓD" and "K¥TY WROC£AWSKIE" respectively. The Polish user still sees the correct Polish characters on her display, but a Dutch or French user won't be able to read the Polish names or addresses properly.

(continues as PDF)

Download the entire article as a PDF file...
The SQL script referred to in the text can be downloaded here: UnicodeScript.zip.
Get a free PDF reader here...

360Data Homepage
360Data