OpenTop support for the Unicode™ Standard
The Unicode™ Standard is a specification
that allocates a unique number (also called a code-point) to every
character from all the languages of the World. At the present time,
some 2 million characters can be uniquely identified by their
Unicode value.
The Unicode standard has been very widely adopted;
it is the prescribed character set for XML documents and the native character set
used by Windows NT™ and the Java™ language.
However, despite its ubiquity, support for
Unicode characters within C++ and its standard library is poor.
At least that was the case - before OpenTop arrived.
Configurable Internal Character Representation
OpenTop provides an extremely flexible solution to the representation
of Unicode characters, allowing you to select the scheme which best meets
the needs of your application.
| Character Type | Size (in bits) | Encoding | Available Range |
| char | 8 | UTF-8 | 0-0x10FFFF |
| char | 8 | ISO-8859-1 | 0-0xFF |
| wchar_t | 16 | UTF-16 | 0-0x10FFFF |
| wchar_t | 32 | UCS-4 | 0-0x10FFFF |
This table shows the available methods of representing Unicode characters
within OpenTop. You choose which native character type you will use within your
application, and let OpenTop do the rest.
Applications which need to support the full Unicode character range,
yet must also integrate with legacy APIs, can
select UTF-8 as the internal character representation.
The UTF-8 encoding has several advantages, not least of which is the fact that
ASCII characters (0x00-0x7F) are represented as unaltered single byte values
which can be exchanged with legacy APIs without difficulty.
If you would prefer to use wide characters (wchar_t),
then OpenTop supports this too. Note that Unicode defines a character as
having a code-point value in the range 0-0x10FFFF, which is way beyond the range of wchar_t
on many platforms where it is only 16 bits wide. In this case OpenTop employs the UTF-16
encoding which allows characters above U+FFFF to be represented via surrogate pairs.
Standard String Class
Wherever possible, OpenTop uses features from the C++ standard library,
so character strings are represented by std::basic_string<T>.
OpenTop does allow you to provide a custom String class, but using the standard std::basic_string<T>
to pass Unicode character strings to and from the library makes the integration with
existing libraries and application code simple.
OpenTop deals with the conversion of Unicode strings when
calling legacy APIs, and provides a StringIterator class for accessing
characters within a string when a multi-byte character encoding is being used.
Unicode Encodings
OpenTop provides a framework for encoding and decoding Unicode characters.
The code conversion framework is simple to use and easily extended to support
any encoding required by your application.
OpenTop is supplied with the following Unicode encodings as standard:-
- ASCII
- IBM850
- ISO-8859-1 (Latin1), ISO-8859-2/15
- UTF-8
- UTF-16, UTF-16LE, UTF-16BE
- Windows-1250, Windows-1251, Windows-1252, Windows-1253, Windows-1254, Windows-1255, Windows-1256, Windows-1257, Windows-1258
If the encoding you require is not listed and you do not wish to write your own
encoder, ElCel Technology will happily provide a quotation on request.