Loading...
Searching...
No Matches
TfUtf8CodePointView Class Referencefinal

Wrapper for a UTF-8 encoded std::string_view that can be iterated over as code points instead of bytes. More...

#include <unicodeUtils.h>

Public Types

using const_iterator = TfUtf8CodePointIterator
 

Public Member Functions

 TfUtf8CodePointView (const std::string_view &view)
 
const_iterator begin () const
 
TfUtf8CodePointIterator::PastTheEndSentinel end () const
 The sentinel will compare as equal to any iterator at the end of the underlying string_view
 
const_iterator cbegin () const
 
TfUtf8CodePointIterator::PastTheEndSentinel cend () const
 The sentinel will compare as equal to any iterator at the end of the underlying string_view
 
bool empty () const
 Returns true if the underlying view is empty.
 
const_iterator EndAsIterator () const
 Returns an iterator of the same type as begin that identifies the end of the string.
 

Detailed Description

Wrapper for a UTF-8 encoded std::string_view that can be iterated over as code points instead of bytes.

Because of the variable length encoding, the TfUtf8CodePointView iterator is a ForwardIterator and is read only.

std::string value{"∫dx"};
for (const auto codePoint : TfUtf8CodePointView{value}) {
if (codePoint == TfUtf8InvalidCodePoint) {
TF_WARN("String cannot be decoded.");
break;
}
}
Wrapper for a UTF-8 encoded std::string_view that can be iterated over as code points instead of byte...
Definition: unicodeUtils.h:338
#define TF_WARN(...)
Issue a warning, but continue execution.
Definition: diagnostic.h:149
constexpr TfUtf8CodePoint TfUtf8InvalidCodePoint
The replacement code point can be used to signal that a code point could not be decoded and needed to...
Definition: unicodeUtils.h:98

The TfUtf8CodePointView's sentinel end() is compatible with range based for loops and the forthcoming STL ranges library; it avoids triplicating the storage for the end iterator. EndAsIterator() can be used for algorithms that require the begin and end iterators to be of the same type but necessarily stores redundant copies of the endpoint.

if (std::any_of(std::cbegin(codePointView), codePointView.EndAsIterator(),
[](const auto c) { return c == TfUtf8InvalidCodePoint; }))
{
TF_WARN("String cannot be decoded");
}

Definition at line 338 of file unicodeUtils.h.

Member Typedef Documentation

◆ const_iterator

Definition at line 340 of file unicodeUtils.h.

Constructor & Destructor Documentation

◆ TfUtf8CodePointView()

TfUtf8CodePointView ( const std::string_view &  view)
inlineexplicit

Definition at line 343 of file unicodeUtils.h.

Member Function Documentation

◆ begin()

const_iterator begin ( ) const
inline

Definition at line 345 of file unicodeUtils.h.

◆ cbegin()

const_iterator cbegin ( ) const
inline

Definition at line 357 of file unicodeUtils.h.

◆ cend()

The sentinel will compare as equal to any iterator at the end of the underlying string_view

Definition at line 364 of file unicodeUtils.h.

◆ empty()

bool empty ( ) const
inline

Returns true if the underlying view is empty.

Definition at line 370 of file unicodeUtils.h.

◆ end()

The sentinel will compare as equal to any iterator at the end of the underlying string_view

Definition at line 352 of file unicodeUtils.h.

◆ EndAsIterator()

const_iterator EndAsIterator ( ) const
inline

Returns an iterator of the same type as begin that identifies the end of the string.

As the end iterator is stored three times, this is slightly heavier than using the PastTheEndSentinel and should be avoided in performance critical code paths. It is provided for convenience when an algorithm restricts the iterators to have the same type.

As C++20 ranges exposes more sentinel friendly algorithms, this can likely be deprecated in the future.

Definition at line 385 of file unicodeUtils.h.


The documentation for this class was generated from the following file: