|
OpenTop 1.5 | |||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | Cross-Platform C++ | ||||||
| SUMMARY: CONSTRUCTOR | METHOD | DETAIL: CONSTRUCTOR | METHOD | |||||||
#include "ot/base/SystemCodeConverter.h"

The char type is only 8 bits wide so can only hold 256 values and is therefore much too small to hold the entire Unicode character range of U+0000 through U+10FFFF. So, when OpenTop is configured to use char characters, it can either represent Unicode characters as a sequence of UTF-8 encoded char values or, if only a very limited range of Unicode characters is required, the characters available in ISO-8859-1 can be represented as their corresponding char values.
The size of a wchar_t is not uniformly defined on all platforms, so OpenTop offers a choice of two encoding schemes when configured to use wchar_t: UCS-4 for 32-bit implementations and UTF-16 for 16-bit implementations.
The following table gives a quick reference to the Unicode encodings available in OpenTop:-
| Character Type | Size (bits) | Encoding | Preprocessor Macro |
|---|---|---|---|
| char | 8 | UTF-8 | - |
| char | 8 | ISO-8859-1 | OT_LATIN1 |
| wchar_t | 16 | UTF-16 | OT_WCHAR |
| wchar_t | 32 | UCS-4 | OT_WCHAR |
As you can see from the table, if no pre-processor macro is defined OpenTop assumes that Unicode characters will be represented by sequences of char values encoded in UTF-8. This is the default configuration. If you wish to use one of the other configurations, then you must build and link with the required OpenTop configuration and ensure that you consistently specify the appropriate pre-processor macro before including any OpenTop include files in all your source files.
| Method Summary | |
static Result |
FromInternalEncoding(UCS4Char& ch, const CharType* from, const CharType* from_end, const CharType*& from_next)Decodes a sequence of CharType values representing Unicode characters encoded into to the native OpenTop encoding into the code-point value for the first Unicode character. |
static size_t |
GetCharSequenceLength(UCharType ch)Returns the number of CharType values that are required to encode the passed Unicode character into the native OpenTop encoding. |
static String |
GetInternalEncodingName()Returns the name of the native OpenTop encoding scheme. |
static size_t |
GetMaximumCharSequenceLength()Returns the maximum number of CharType elements that may be used to encode a single Unicode character. |
static bool |
IsSequenceStartChar(UCharType ch)Tests the passed value ch to see if it marks the start of an encoded sequence, a standalone character or a trailing value. |
static bool |
IsValidCharSequence(const CharType* from, size_t len)Tests the passed CharType sequence starting at from for a length of len to see if it represents a properly encoded Unicode character in the native OpenTop encoding. |
static Result |
TestEncodedSequence(const CharType* from, const CharType* from_end, const CharType*& from_next)Tests a sequence of CharType values to check that it is encoded according to the native OpenTop encoding. |
static String |
ToInternalEncoding(UCS4Char ch)Returns the Unicode character ch as a String containing a sequence of CharType values representing Unicode characters encoded into to the native OpenTop encoding. |
static Result |
ToInternalEncoding(UCS4Char ch, CharType* to, const CharType* to_limit, CharType*& to_next)Converts a Unicode character value into a sequence of CharType values representing Unicode characters encoded into to the native OpenTop encoding. |
| Methods inherited from class ot::CodeConverterBase |
IsLegalUTF16(const wchar_t*, size_t), IsLegalUTF8(const Byte*, size_t), UTF8Decode(UCS4Char&, const Byte*, const Byte*, const Byte*&), UTF8Encode(UCS4Char, Byte*, const Byte*, Byte*&) |
| Method Detail |
static Result FromInternalEncoding(UCS4Char& ch,
const CharType* from,
const CharType* from_end,
const CharType*& from_next)
ch - from - from_end - from_next - NullPointerException - static size_t GetCharSequenceLength(UCharType ch)
OpenTop may be configured to use one of several Unicode encoding schemes, two of which (UTF-16 and UTF-8) encode Unicode characters into a variable length sequence of CharType values. All the other supported encoding represent a Unicode character using a single CharType value.
No matter what encoding scheme is employed, OpenTop can always determine the number of CharType elements that are needed to encode a single Unicode charcater simply by inspecting the first CharType value in the sequence.
In the case of UTF-16, the length of the sequence is 1 unless ch is a surrogate pair start character (0xD800-0xDBFF) in which case the length is 2.
In the case of UTF-8, the sequence length can be established by looking at the number of high-order bits set to '1' in the passed char ch. If no high-order bits are set, then the passed character is equivalent to an ASCII character and the sequence has a length of 1. In common with the rest of OpenTop, this method does not recognize UTF-8 sequences greater than 4 bytes. Lead bytes that indicate sequences longer than 4 are treated as indicating a sequence of length 1.
ch - static String GetInternalEncodingName()
static size_t GetMaximumCharSequenceLength()
static bool IsSequenceStartChar(UCharType ch)
ch - static bool IsValidCharSequence(const CharType* from,
size_t len)
from - len - static Result TestEncodedSequence(const CharType* from,
const CharType* from_end,
const CharType*& from_next)
from - from_end - from_next - NullPointerException - static String ToInternalEncoding(UCS4Char ch)
ch - IllegalCharacterException - static Result ToInternalEncoding(UCS4Char ch,
CharType* to,
const CharType* to_limit,
CharType*& to_next)
ch - to - to_limit - to_next - NullPointerException -
|
OpenTop 1.5 | |||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | Cross-Platform C++ | ||||||
| SUMMARY: CONSTRUCTOR | METHOD | DETAIL: CONSTRUCTOR | METHOD | |||||||