Cross-Platform C++

ot::cvt
class CodeConverter

#include "ot/cvt/CodeConverter.h"

ot::ManagedObject ot::CodeConverterBase Common base class for all code converters. A CodeConverter translates Unicode characters into byte sequences and vice versa. The design of CodeConverter is based on the std::codecvt class from the C++ standard library.

OpenTop comes supplied with CodeConverters for many common encodings such as UTF-8, UTF-16, Latin1, Windows-125x and many others.

See also:
CodeConverterFactory



Constructor/Destructor Summary
CodeConverter()
         Creates a CodeConverter with default values.

Method Summary
 virtual bool alwaysNoConversion() const
         Tests if this CodeConverter is using the same encoding as the native OpenTop encoding.
 virtual String decode(const ByteString& from)
         Decodes a sequence of bytes contained within a ByteString into a Unicode String.
 virtual Result decode(const Byte* from, const Byte* from_end, const Byte*& from_next, CharType* to, CharType* to_limit, CharType*& to_next)
         Decodes an array of bytes into an array of CharType values that represent Unicode characters in the native OpenTop encoding.
static String Decode(const String& encoding, const ByteString& from)
         Static function to decode a sequence of encoded bytes contained within a ByteString into a Unicode String.
 virtual ByteString encode(const String& from)
         Encodes a sequence of Unicode characters into a sequence of bytes.
 virtual Result encode(const CharType* from, const CharType* from_end, const CharType*& from_next, Byte* to, Byte* to_limit, Byte*& to_next)
         Encodes an array of CharType characters, representing Unicode characters in the native OpenTop encoding into a sequence of bytes.
static ByteString Encode(const String& encoding, const String& from)
         Static function to encode a string of Unicode characters (represented by a sequence of CharType values representing Unicode characters encoded into to the native OpenTop encoding) into a sequence of bytes using the encoding name provided.
 virtual size_t getDecodedLength(const Byte* from, const Byte* from_end) const
         Returns the number of Unicode characters that would be created by decoding the array of bytes starting at from.
 virtual String getEncodingName() const
         Returns the canonical name for the encoding handled by this CodeConverter.
 CharAction getInvalidCharAction() const
         Returns the policy for dealing with invalid byte sequences or Unicode characters which cannot be mapped into the internal OpenTop character encoding.
 UCS4Char getInvalidCharReplacement() const
         Returns the Unicode character that will be used when this CodeConverter detects an invalid byte sequence.
 virtual size_t getMaxEncodedLength() const
         Returns the maximum number of bytes used to encode a single Unicode character up to U+10FFFF.
 CharAction getUnmappableCharAction() const
         Returns the policy for dealing with Unicode characters that cannot be mapped into the target encoding.
 UCS4Char getUnmappableCharReplacement() const
         Returns the Unicode character that will be used when this CodeConverter detects an unmappable Unicode character.
protected  virtual Result handleEncodingToInternalError(UCS4Char ch, CharType* to, const CharType* to_limit, CharType*& to_next)
         Helper function called by derived classes' decode() method when they encounter a Unicode character which cannot be mapped into the native OpenTop encoding.
protected  void handleInvalidByteSequence(const Byte* from, size_t len) const
         Helper function that simply throws a MalformedInputException.
protected  virtual Result handleUnmappableCharacter(UCS4Char ch, Byte* to, Byte* to_limit, Byte*& to_next)
         Helper function called by derived classes' encode() method when it encounters an unmappable Unicode character.
protected  void internalEncodingError(const CharType* from, size_t len) const
         Helper function called by derived classes when they encounter a badly encoded internal CharType array.
 void setInvalidCharAction(CharAction eAction)
         Sets the policy for dealing with badly encoded byte sequences or encoded sequences which result in a Unicode character which cannot be mapped into the native OpenTop encoding.
 void setInvalidCharReplacement(UCS4Char ch)
         Sets the replacement Unicode character used when the CodeConverter detects an invalid byte sequence.
 void setUnmappableCharAction(CharAction eAction)
         Sets the policy for dealing with Unicode characters that cannot be mapped into the target encoding.
 void setUnmappableCharReplacement(UCS4Char ch)
         Sets the replacement Unicode character used when the CodeConverter detects a Unicode character than cannot be encoded into the target encoding.
protected  void throwUnsupported(unsigned long illegalChar) const
        

Methods inherited from class ot::CodeConverterBase
IsLegalUTF16(const wchar_t*, size_t), IsLegalUTF8(const Byte*, size_t), UTF8Decode(UCS4Char&, const Byte*, const Byte*, const Byte*&), UTF8Encode(UCS4Char, Byte*, const Byte*, Byte*&)

Methods inherited from class ot::ManagedObject
addRef(), getRefCount(), onFinalRelease(), operator=(const ManagedObject&), release()

Enumerations

enum CharAction { abort  
  replace  


Constructor/Destructor Detail

CodeConverter

 CodeConverter()
Creates a CodeConverter with default values.


Method Detail

alwaysNoConversion

virtual bool alwaysNoConversion() const
Tests if this CodeConverter is using the same encoding as the native OpenTop encoding. If so, the reading and writing of characters can be optimized to by-pass the encoding process.

Returns:
true if this CodeConverter encodes Unicode characters into the native OpenTop encoding; false otherwise

decode

virtual String decode(const ByteString& from)
Decodes a sequence of bytes contained within a ByteString into a Unicode String. The Unicode characters within the returned String will be represented by a sequence of CharType values representing Unicode characters encoded into to the native OpenTop encoding.

Parameters:
from - a ByteString containing the byte sequence to decode.
Returns:
a String containing the decoded Unicode characters represented as a sequence of CharType values representing Unicode characters encoded into to the native OpenTop encoding.
Exceptions:
MalformedInputException - if an invalid byte sequence is detected and the policy for this CodeConverter is to abort in this situation.
See also:
encode()
Since:
OpenTop 1.5

decode

virtual Result decode(const Byte* from,
                      const Byte* from_end,
                      const Byte*& from_next,
                      CharType* to,
                      CharType* to_limit,
                      CharType*& to_next)
Decodes an array of bytes into an array of CharType values that represent Unicode characters in the native OpenTop encoding.

Parameters:
from - pointer to the start of the byte array to decode
from_end - pointer to the next byte past the end of the byte array
from_next - return parameter which holds a pointer to the next byte in the array which has yet to be processed
to - pointer to the start of a CharType array which will hold the result of the decoding operation
to_limit - pointer to the next CharType past the end of the result array
to_next - return parameter which holds a pointer to the next CharType in the result array
Returns:
a Result code indicating the success of the operation.
Exceptions:
MalformedInputException - if an invalid byte sequence is detected and the policy for this CodeConverter is to abort in this situation.
UnmappableCharacterException - if an unmappable Unicode character is found in the encoded input and the policy for this CodeConverter is to abort in this situation.

Decode

static String Decode(const String& encoding,
                     const ByteString& from)
Static function to decode a sequence of encoded bytes contained within a ByteString into a Unicode String. The Unicode characters within the returned String will be represented by a sequence of CharType values representing Unicode characters encoded into to the native OpenTop encoding.

This function is provided for convenience, but applications which perform a lot of decoding or which need to configure a CodeConverter to behave in a specific way when errors are encountered should consider using the encode() instance method instead.

Parameters:
from - a ByteString containing the byte sequence to decode.
Returns:
a String containing the decoded Unicode characters represented as a sequence of CharType values representing Unicode characters encoded into to the native OpenTop encoding.
Exceptions:
MalformedInputException - if an invalid byte sequence is detected and the policy for the CodeConverter provided by the registered CodeConverterFactory is to abort in this situation.
See also:
Encode()
Since:
OpenTop 1.5

encode

virtual ByteString encode(const String& from)
Encodes a sequence of Unicode characters into a sequence of bytes.

Parameters:
from - String containing a sequence of CharType values representing Unicode characters encoded into to the native OpenTop encoding.
Exceptions:
UnmappableCharacterException - if an unmappable Unicode character is detected and the policy for this CodeConverter is to abort in this situation.
See also:
decode()
Since:
OpenTop 1.5

encode

virtual Result encode(const CharType* from,
                      const CharType* from_end,
                      const CharType*& from_next,
                      Byte* to,
                      Byte* to_limit,
                      Byte*& to_next)
Encodes an array of CharType characters, representing Unicode characters in the native OpenTop encoding into a sequence of bytes.

Parameters:
from - pointer to the start of the CharType array to encode.
from_end - pointer to the next CharType past the end of the input array.
from_next - return parameter which holds a pointer to the next CharType in the array which has yet to be processed.
to - pointer to the start of a byte array which will hold the result of the encoding operation.
to_limit - pointer to the next byte past the end of the result array.
to_next - return parameter which holds a pointer to the next byte in the result array.
Returns:
a Result code indicating the success of the operation.
Exceptions:
UnmappableCharacterException - if an unmappable Unicode character is detected and the policy for this CodeConverter is to abort in this situation.

Encode

static ByteString Encode(const String& encoding,
                         const String& from)
Static function to encode a string of Unicode characters (represented by a sequence of CharType values representing Unicode characters encoded into to the native OpenTop encoding) into a sequence of bytes using the encoding name provided. This function is provided as a convenience for applications which do not need to encode many Strings. It is functionaly equivalent to writing the following code (with error handling removed):-
    RefPtr<CodeConverter> rpConv = 
        CodeConverterFactory::GetInstance().getConverter(encoding);
    return rpConv->encode(from);
Applications which perform a lot of encoding or which need to configure a CodeConverter to behave in a specific way when errors are encountered should consider using the encode() instance method instead.

Parameters:
encoding - the name of the encoding which is used to request a suitable CodeConverter from the registered CodeConverterFactory.
from - String containing a sequence of CharType values representing Unicode characters encoded into to the native OpenTop encoding to encode.
Exceptions:
UnsupportedEncodingException - if the registered CodeConverterFactory is unable to create a CodeConverter for the specified encoding.
UnmappableCharacterException - if an unmappable Unicode character is detected and the policy for the CodeConverter provided by the registered CodeConverterFactory is to abort in this situation. (The default action is to replace the unmappable character with a mappable replacement character)
See also:
Decode()
Since:
OpenTop 1.5

getDecodedLength

virtual size_t getDecodedLength(const Byte* from,
                                const Byte* from_end) const
Returns the number of Unicode characters that would be created by decoding the array of bytes starting at from. Depending on the native OpenTop encoding, this is not necessarily the same number of CharType characters that will be required to represent the Unicode characters.

Parameters:
from - pointer to the start of an encoded array of bytes
from_end - pointer to the next byte after the end of the array
Returns:
the number of Unicode characters represented by the byte sequence

getEncodingName

virtual String getEncodingName() const
Returns the canonical name for the encoding handled by this CodeConverter.


getInvalidCharAction

CharAction getInvalidCharAction() const
Returns the policy for dealing with invalid byte sequences or Unicode characters which cannot be mapped into the internal OpenTop character encoding.

See also:
setInvalidCharAction()

getInvalidCharReplacement

UCS4Char getInvalidCharReplacement() const
Returns the Unicode character that will be used when this CodeConverter detects an invalid byte sequence.

See also:
getInvalidCharAction()

getMaxEncodedLength

virtual size_t getMaxEncodedLength() const
Returns the maximum number of bytes used to encode a single Unicode character up to U+10FFFF.


getUnmappableCharAction

CharAction getUnmappableCharAction() const
Returns the policy for dealing with Unicode characters that cannot be mapped into the target encoding.

See also:
setUnmappableCharAction()

getUnmappableCharReplacement

UCS4Char getUnmappableCharReplacement() const
Returns the Unicode character that will be used when this CodeConverter detects an unmappable Unicode character.

See also:
getUnmappableCharAction()

handleEncodingToInternalError

protected virtual Result handleEncodingToInternalError(UCS4Char ch,
                                                       CharType* to,
                                                       const CharType* to_limit,
                                                       CharType*& to_next)
Helper function called by derived classes' decode() method when they encounter a Unicode character which cannot be mapped into the native OpenTop encoding.

Parameters:
ch - the unmappable Unicode character.
to - pointer to the next CharType element in the output array for the current decoding operation.
to_limit - pointer to the next element after the end of the output array.
to_next - return parameter which holds a pointer to the next unused element in the result array.
Returns:
a Result code indicating the success of the operation
Exceptions:
UnmappableCharacterException - if this CodeConverter has been configured to abort when it encounters an unmappable Unicode character.
Since:
OpenTop 1.5

handleInvalidByteSequence

protected void handleInvalidByteSequence(const Byte* from,
                                         size_t len) const
Helper function that simply throws a MalformedInputException.

Exceptions:
MalformedInputException - always

handleUnmappableCharacter

protected virtual Result handleUnmappableCharacter(UCS4Char ch,
                                                   Byte* to,
                                                   Byte* to_limit,
                                                   Byte*& to_next)
Helper function called by derived classes' encode() method when it encounters an unmappable Unicode character.

Parameters:
ch - the unmappable Unicode character
to - pointer to the next byte in the output byte array for the current encoding operation
to_limit - pointer to the next byte after the end of the output byte buffer
to_next - return parameter which holds a pointer to the next byte in the result array
Returns:
a Result code indicating the success of the operation

internalEncodingError

protected void internalEncodingError(const CharType* from,
                                     size_t len) const
Helper function called by derived classes when they encounter a badly encoded internal CharType array.

Parameters:
from - pointer to the start of the array @len length of the array

setInvalidCharAction

void setInvalidCharAction(CharAction eAction)
Sets the policy for dealing with badly encoded byte sequences or encoded sequences which result in a Unicode character which cannot be mapped into the native OpenTop encoding. Two policies are supported: replace or abort.

When the action is set to CodeConverter::abort, a CharacterCodingException is thrown by decode() when an invalid or unsupported byte sequence is decoded. When the action is set to CodeConverter::replace, the invalid byte sequence is decoded as the replacement character returned from getInvalidCharReplacement().

Parameters:
eAction - the required action to take.
See also:
getInvalidCharAction()

setInvalidCharReplacement

void setInvalidCharReplacement(UCS4Char ch)
Sets the replacement Unicode character used when the CodeConverter detects an invalid byte sequence.

See also:
setInvalidCharAction()

setUnmappableCharAction

void setUnmappableCharAction(CharAction eAction)
Sets the policy for dealing with Unicode characters that cannot be mapped into the target encoding. Two policies are supported: replace or abort.

When the action is set to CodeConverter::abort, an UnmappableCharacterException is thrown by encode() when an unmappable Unicode character is encoded. When the action is set to CodeConverter::replace, the unmappable character is replaced by the character returned from getUnmappableCharReplacement().

Parameters:
eAction - the required action to take.
See also:
getUnmappableCharAction()

setUnmappableCharReplacement

void setUnmappableCharReplacement(UCS4Char ch)
Sets the replacement Unicode character used when the CodeConverter detects a Unicode character than cannot be encoded into the target encoding.

See also:
setUnmappableCharAction()

throwUnsupported

protected void throwUnsupported(unsigned long illegalChar) const



Cross-Platform C++

Found a bug or missing feature? Please email us at support@elcel.com

Copyright © 2000-2005 ElCel Technology   Trademark Acknowledgements