ot
class Character
#include "ot/base/Character.h"
Represents a Unicode character using an internal sequence of one or more CharType values.
It provides optimized routines for converting Unicode characters into a sequence of one or more CharType values and for decoding a CharType sequence into a single UCS4Char value.
The Character class also contains a number of convenient methods for querying the characteristics of the encoded Unicode character. These routines have counterparts in the standard C++ library, but the standard library routines rely on the capabilities of a locale which may not be available for Unicode.
| Method Summary
|
void |
appendToString(String& ) const
Appends the multi-character sequence controlled by this Character to the passed String str. |
const CharType* |
data() const
Returns a pointer to the controlled CharType sequence buffer. |
CharType |
first() const
Returns the first CharType value in the controlled sequence. |
bool |
isAscii() const
Tests if the Unicode character represented by this Character is in the ASCII range U+0000-U+007F. |
bool |
isDigit() const
Tests if the Unicode character represented by this Character represents an ASCII decimal digit 0-9. |
bool |
isEOF() const
Tests if this Character is equal to the special Character: Character::EndOfFileCharacter. |
bool |
isHexDigit() const
Tests if the Unicode character represented by this Character represents an ASCII hexadecimal digit [0-9], [A-F], [a-f]. |
bool |
isSpace() const
Tests if the Unicode character represented by this Character represents white-space according to common Windows and Unix conventions. |
size_t |
length() const
Returns the number of CharType elements in the controlled character sequence which are being used to encode the represented Unicode character.. |
bool |
operator!=(const Character& rhs) const
Inequality operator. |
bool |
operator!=(CharType c) const
Inequality operator. |
Character& |
operator=(const Character& rhs)
Assignment operator. |
bool |
operator==(const Character& rhs) const
Equality operator. |
bool |
operator==(CharType c) const
Equality operator. |
String |
toString() const
Returns a String containing an identical sequence of CharType values as the sequence controller and contained by this Character. |
UCS4Char |
toUnicode() const
Converts the controlled multi-character sequence into a 32-bit Unicode code-point value. |
| Public Static Data Members |
EndOfFileCharacter
Character EndOfFileCharacter
A Character representing the 'end of file' condition.
This is a special Character that can be returned from functions that read a single Character when the end of file condition has been reached.
| Constructor/Destructor Detail |
Character
Character()
-
Default constructor.
Creates a Character that is equivalent to the EndOfFile character.
Character
Character(const Character& rhs)
-
Copy constructor.
Constructs a Character with the same value as rhs.
- Parameters:
rhs -
the Character to copy
Character
Character(UCS4Char ch)
-
Constructs a Character by transforming the code-point value into a sequence of CharType values representing Unicode characters encoded into to the native OpenTop encoding.
- Parameters:
ch -
the code-point value of the Unicode character.
- Exceptions:
IllegalCharacterException -
if ch is not a legal Unicode character in the range U+0000-U+10FFFF or, if OpenTop has been configured to use an incomplete Unicode character encoding (such as ISO-8859-1) if the character is not mappable in the configured native encoding.
Character
Character(const CharType* pSeqStart,
size_t len)
-
Constructs a Character given a pointer to a sequence of CharType elements and its maximum length.
The input sequence consists of one or more CharType values that, when decoded, represent a single Unicode character.
The sequence, from the first CharType element and including any trailing elements (indicated by the value of the first element), are copied into the internal CharType sequence.
- Parameters:
pSeqStart -
a pointer to the first element of a CharType sequence that represents at least one Unicode character.
len -
the number of CharType elements that are legally addressable within the array starting at pSeqStart
- Exceptions:
NullPointerException -
if pSeqStart is null.
IllegalCharacterException -
if the array starting at pSeqStart does not represent a valid Unicode character in the native OpenTop encoding.
appendToString
void appendToString(String& ) const
-
Appends the multi-character sequence controlled by this Character to the passed String str.
- Parameters:
str -
the String which will have this Character appended
data
const CharType* data() const
-
Returns a pointer to the controlled CharType sequence buffer.
- Returns:
-
a pointer to the controlled CharType sequence.
- See also:
-
length()
first
CharType first() const
-
Returns the first CharType value in the controlled sequence.
- Returns:
-
the first CharType value in the controlled sequence.
- Exceptions:
IllegalCharacterException -
if this Character does not represent a valid Unicode character in the range U+0000-U+10FFFF.
isAscii
bool isAscii() const
-
Tests if the Unicode character represented by this Character is in the ASCII range U+0000-U+007F.
- Returns:
-
true if this Character is in the ASCII range; false otherwise.
- See also:
-
UnicodeCharacterType::IsAscii()
isDigit
bool isDigit() const
-
Tests if the Unicode character represented by this Character represents an ASCII decimal digit 0-9.
- Returns:
-
true if this Character is a decimal digit [0-9]; false otherwise.
- See also:
-
UnicodeCharacterType::IsDigit()
isEOF
bool isEOF() const
-
Tests if this Character is equal to the special Character: Character::EndOfFileCharacter.
Functions that read a character stream and return a Character need a method to indicate that the end of stream has been reached. To achieve this they return a special Character with a unique value that is different from all valid Unicode characters.
- Returns:
-
true if this Character is equal to the Character::EndOfFileCharacter; false otherwise.
isHexDigit
bool isHexDigit() const
-
Tests if the Unicode character represented by this Character represents an ASCII hexadecimal digit [0-9], [A-F], [a-f].
- Returns:
-
true if this Character is a hexadecimal digit; false otherwise.
- See also:
-
UnicodeCharacterType::IsHexDigit()
isSpace
bool isSpace() const
-
Tests if the Unicode character represented by this Character represents white-space according to common Windows and Unix conventions.
Space characters are:-
-
'\t' U+0009 HORIZONTAL TABULATION
-
'\n' U+000A NEW LINE
-
'\f' U+000C FORM FEED
-
'\r' U+000D CARRIAGE RETURN
-
' ' U+0020 SPACE
- Returns:
-
true if this Character is a space character; false otherwise.
- See also:
-
UnicodeCharacterType::IsSpace()
length
size_t length() const
-
Returns the number of CharType elements in the controlled character sequence which are being used to encode the represented Unicode character..
- Returns:
-
the length of the controlled CharType sequence.
- See also:
-
data()
operator!=
bool operator!=(const Character& rhs) const
-
Inequality operator.
Tests if the Unicode character represented by this is not the same Unicode character as rhs;
- Returns:
-
false if the Unicode character represented by this Character is equal to the Unicode character rhs; true otherwise
operator!=
bool operator!=(CharType c) const
-
Inequality operator.
Tests if the internal multi-character sequence has a length other than 1 or the first member is not equal to c.
- Returns:
-
true if the Unicode character represented by this Character is equal to the single CharType value c; false otherwise
operator=
Character& operator=(const Character& rhs)
-
Assignment operator.
Sets this Character equal to rhs.
- Returns:
-
a reference to this Character.
operator==
bool operator==(const Character& rhs) const
-
Equality operator.
Tests if the Unicode character represented by this is the same Unicode character as rhs;
- Returns:
-
true if the Unicode character represented by this Character is equal to the Unicode character rhs; false otherwise
operator==
bool operator==(CharType c) const
-
Equality operator.
Tests if the internal multi-character sequence has a length of 1 and the first member is equal to c.
- Returns:
-
true if the Unicode character represented by this Character is equal to the CharType value c; false otherwise
toString
String toString() const
-
Returns a String containing an identical sequence of CharType values as the sequence controller and contained by this Character.
- Returns:
-
a String representing the same sequence of CharType values.
toUnicode
UCS4Char toUnicode() const
-
Converts the controlled multi-character sequence into a 32-bit Unicode code-point value.
- Returns:
-
the Unicode character represented by this Character as a 32-bit value. A value of 0xFFFF is returned if this Character has not been initialized.
- Exceptions:
IllegalCharacterException -
if this Character does not represent a valid Unicode character.
Found a bug or missing feature? Please email us at support@elcel.com