|
Defines an iterator over a UTF-8 encoded string that extracts unicode code point values. More...
#include <unicodeUtils.h>
Classes | |
class | PastTheEndSentinel |
Model iteration ending when the underlying iterator's end condition has been met. More... | |
Public Types | |
using | iterator_category = std::forward_iterator_tag |
using | value_type = TfUtf8CodePoint |
using | difference_type = std::ptrdiff_t |
using | pointer = void |
using | reference = TfUtf8CodePoint |
Public Member Functions | |
TfUtf8CodePointIterator (const std::string_view::const_iterator &it, const std::string_view::const_iterator &end) | |
Constructs an iterator that can read UTF-8 character sequences from the given starting string_view iterator it. | |
value_type | operator* () const |
Retrieves the current UTF-8 character in the sequence as its Unicode code point value. | |
std::string_view::const_iterator | GetBase () const |
Retrieves the wrapped string iterator. | |
bool | operator== (const TfUtf8CodePointIterator &rhs) const |
Determines if two iterators are equal. | |
bool | operator!= (const TfUtf8CodePointIterator &rhs) const |
Determines if two iterators are unequal. | |
TfUtf8CodePointIterator & | operator++ () |
Advances the iterator logically one UTF-8 character sequence in the string. | |
TfUtf8CodePointIterator | operator++ (int) |
Advances the iterator logically one UTF-8 character sequence in the string. | |
Friends | |
bool | operator== (const TfUtf8CodePointIterator &lhs, PastTheEndSentinel) |
Checks if the lhs iterator is at or past the end for the underlying string_view | |
bool | operator== (PastTheEndSentinel lhs, const TfUtf8CodePointIterator &rhs) |
bool | operator!= (const TfUtf8CodePointIterator &lhs, PastTheEndSentinel rhs) |
bool | operator!= (PastTheEndSentinel lhs, const TfUtf8CodePointIterator &rhs) |
Defines an iterator over a UTF-8 encoded string that extracts unicode code point values.
UTF-8 is a variable length encoding, meaning that one Unicode code point can be encoded in UTF-8 as 1, 2, 3, or 4 bytes. This iterator takes care of consuming the valid UTF-8 bytes for a code point while incrementing.
Definition at line 116 of file unicodeUtils.h.
class TfUtf8CodePointIterator::PastTheEndSentinel |
Model iteration ending when the underlying iterator's end condition has been met.
Definition at line 126 of file unicodeUtils.h.
using difference_type = std::ptrdiff_t |
Definition at line 120 of file unicodeUtils.h.
using iterator_category = std::forward_iterator_tag |
Definition at line 118 of file unicodeUtils.h.
using pointer = void |
Definition at line 121 of file unicodeUtils.h.
using reference = TfUtf8CodePoint |
Definition at line 122 of file unicodeUtils.h.
using value_type = TfUtf8CodePoint |
Definition at line 119 of file unicodeUtils.h.
|
inline |
Constructs an iterator that can read UTF-8 character sequences from the given starting string_view iterator it.
end is used as a guard against reading byte sequences past the end of the source string.
When working with views of substrings, end must not point to a continuation byte in a valid UTF-8 byte sequence to avoid decoding errors.
Definition at line 135 of file unicodeUtils.h.
|
inline |
Retrieves the wrapped string iterator.
Definition at line 153 of file unicodeUtils.h.
|
inline |
Determines if two iterators are unequal.
This intentionally does not consider the end iterator to allow for comparison of iterators between different substring views of the same underlying string.
Definition at line 171 of file unicodeUtils.h.
|
inline |
Retrieves the current UTF-8 character in the sequence as its Unicode code point value.
Returns TfUtf8InvalidCodePoint
when the byte sequence pointed to by the iterator cannot be decoded.
A code point might be invalid because it's incorrectly encoded, exceeds the maximum allowed value, or is in the disallowed surrogate range.
Definition at line 147 of file unicodeUtils.h.
|
inline |
Advances the iterator logically one UTF-8 character sequence in the string.
The underlying string iterator will be advanced according to the variable length encoding of the next UTF-8 character, but will never consume non-continuation bytes after the current one.
Definition at line 181 of file unicodeUtils.h.
|
inline |
Advances the iterator logically one UTF-8 character sequence in the string.
The underlying string iterator will be advanced according to the variable length encoding of the next UTF-8 character, but will never consume non-continuation bytes after the current one.
Definition at line 213 of file unicodeUtils.h.
|
inline |
Determines if two iterators are equal.
This intentionally does not consider the end iterator to allow for comparison of iterators between different substring views of the same underlying string.
Definition at line 162 of file unicodeUtils.h.
|
friend |
Definition at line 234 of file unicodeUtils.h.
|
friend |
Definition at line 239 of file unicodeUtils.h.
|
friend |
Checks if the lhs
iterator is at or past the end for the underlying string_view
Definition at line 222 of file unicodeUtils.h.
|
friend |
Definition at line 228 of file unicodeUtils.h.