ElCel Technology Home|Company|Software|Services|MyAccount|Shopping Cart
XML Tools
User Comments

OpenTop C++
Core Services
Secure Sockets
Web Services
Release Notes

OpenTop support for the Unicode™ Standard

The Unicode™ Standard is a specification that allocates a unique number (also called a code-point) to every character from all the languages of the World. At the present time, some 2 million characters can be uniquely identified by their Unicode value.

The Unicode standard has been very widely adopted; it is the prescribed character set for XML documents and the native character set used by Windows NT™ and the Java™ language. However, despite its ubiquity, support for Unicode characters within C++ and its standard library is poor. At least that was the case - before OpenTop arrived.

Configurable Internal Character Representation

OpenTop provides an extremely flexible solution to the representation of Unicode characters, allowing you to select the scheme which best meets the needs of your application.

Character Type Size (in bits) Encoding Available Range
char 8 UTF-8 0-0x10FFFF
char 8 ISO-8859-1 0-0xFF
wchar_t 16 UTF-16 0-0x10FFFF
wchar_t 32 UCS-4 0-0x10FFFF

This table shows the available methods of representing Unicode characters within OpenTop. You choose which native character type you will use within your application, and let OpenTop do the rest.

Applications which need to support the full Unicode character range, yet must also integrate with legacy APIs, can select UTF-8 as the internal character representation. The UTF-8 encoding has several advantages, not least of which is the fact that ASCII characters (0x00-0x7F) are represented as unaltered single byte values which can be exchanged with legacy APIs without difficulty.

If you would prefer to use wide characters (wchar_t), then OpenTop supports this too. Note that Unicode defines a character as having a code-point value in the range 0-0x10FFFF, which is way beyond the range of wchar_t on many platforms where it is only 16 bits wide. In this case OpenTop employs the UTF-16 encoding which allows characters above U+FFFF to be represented via surrogate pairs.

Standard String Class

Wherever possible, OpenTop uses features from the C++ standard library, so character strings are represented by std::basic_string<T>.

OpenTop does allow you to provide a custom String class, but using the standard std::basic_string<T> to pass Unicode character strings to and from the library makes the integration with existing libraries and application code simple.

OpenTop deals with the conversion of Unicode strings when calling legacy APIs, and provides a StringIterator class for accessing characters within a string when a multi-byte character encoding is being used.

Unicode Encodings

OpenTop provides a framework for encoding and decoding Unicode characters. The code conversion framework is simple to use and easily extended to support any encoding required by your application.

OpenTop is supplied with the following Unicode encodings as standard:-

  • IBM850
  • ISO-8859-1 (Latin1), ISO-8859-2/15
  • UTF-8
  • UTF-16, UTF-16LE, UTF-16BE
  • Windows-1250, Windows-1251, Windows-1252, Windows-1253, Windows-1254, Windows-1255, Windows-1256, Windows-1257, Windows-1258

If the encoding you require is not listed and you do not wish to write your own encoder, ElCel Technology will happily provide a quotation on request.