Overview

Defines utility classes for counting, timing, measuring, and recording events. The TraceCollector class records TraceEvent objects. The TraceReporter class contains functions to generate reports on events gathered by the TraceCollector object.

Instrumentation

Instrumentation is done by adding TRACE macros to code.

For example:

#include "pxr/base/trace/trace.h"
 
// Typical usage.
void Foo()
{
    TRACE_FUNCTION();
    ...
    {
        // More work is done here that needs timing separated.
        TRACE_SCOPE("Inner Scope");
        ...
    }
}

Python tracing is also supported:

from pxr import Trace
 
@Trace.TraceFunction
def foo():
    ...
    with Trace.TraceScope("Inner Scope"):
        ...

Adding Trace macros does have a small overhead even when tracing is disabled. Sometimes a performance sensitive function may have a slow path that is taken infrequently, but timing information is needed. In cases like this, it is possible to reduce the overhead of the instrumentation to specific scopes.

// This is an example of a performance sensitive function that has a slow path.
void PerformanceSensitiveFunction(bool slowPath)
{
    // No TRACE Macro is used in the fast path to avoid any overhead.
    if (slowPath) {
        // This will only pay the overhead cost of the trace instrumentation on
        // the slow path.
        TRACE_FUNCTION_SCOPE("Slow path");
        ...
    }
    ...
}

Recording and Reporting

Recording is done through the TraceCollector class. Enabling the collector will cause TRACE macros to record events. Reports are generated with the TraceReporter class.

int main(int argc, char* argv[])
{
    // Start recording events.
    TraceCollector::GetInstance().SetEnabled(true);
 
    Foo();
    ...
 
    // Stop recording events.
    TraceCollector::GetInstance().SetEnabled(false);
 
    // Print the ASCII report
    ofstream reportFile("report.trace");
    TraceReporter::GetGlobalReporter().Report(reportFile);
 
    // Print the chrome-compatible report
    ofstream chromeReportFile("report.json");
    TraceReporter::GetGlobalReporter().ReportChromeTracing(chromeReportFile);
}

And in Python:

from pxr import Trace
 
# Start recording events.
Trace.Collector().enabled = True
 
Foo()
...
 
# Stop recording events.
Trace.Collector().enabled = False
 
# Print the ASCII report
Trace.Reporter.globalReporter.Report("report.trace")
 
# Print the chrome-compatible report
Trace.Reporter.globalReporter.ReportChromeTracingToFile("report.json")

The default report is an aggregated call graph showing the include time, exclusive time, and call count of each scope.

Example Report:

Tree view  ==============
   inclusive    exclusive        
000 ms                    1 samples    MainThread
000 ms     4.000 ms       1 samples    | OuterScope
000 ms     8.000 ms       1 samples    |   InnerScope
000 ms     1.000 ms       2 samples    |   | Inner Scope 2
000 ms                    1 samples    Thread 1
000 ms     4.000 ms       1 samples    | OuterScope
000 ms     8.000 ms       1 samples    |   InnerScope
000 ms     1.000 ms       2 samples    |   | Inner Scope 2

The Chrome tracing format is also supported using the TraceReporter::ReportChromeTracing method.

A report can also be generated from a program instrumented with libtrace using the PXR_ENABLE_GLOBAL_TRACE environment variable. If this variable is set, the TraceCollector singleton will start recording on initialization, and a report will be written to stdout at program exit.

>env PXR_ENABLE_GLOBAL_TRACE=1 usdview HelloWorld.usda --quitAfterStartup
 
Tree view  ==============
   inclusive    exclusive        
500 ms                    1 samples    Main Thread
701 ms     0.701 ms       8 samples    | SdfPath::_InitWithString
003 ms     0.003 ms       2 samples    | {anonymous}::VtDictionaryToPython::convert
580 ms   275.580 ms       3 samples    | PlugPlugin::_Load
014 ms     0.014 ms       3 samples    | UcGetCurrentUnit
470 ms     0.002 ms       1 samples    | UcIsKnownUnit
467 ms     0.026 ms       1 samples    |   Uc::_InitUnitData [initialization]
442 ms     1.442 ms       1 samples    |   | Uc_Engine::GetValue
750 ms     0.000 ms       1 samples    | UcGetValue
750 ms     0.750 ms       1 samples    |   Uc_Engine::GetValue
141 ms     0.053 ms       1 samples    | PrCreatePathResolverForUnit
002 ms     0.002 ms       6 samples    |   UcIsKnownUnit
    ...

The aggregated call graph can be accessed through TraceReporter::GetAggregateTreeRoot. The non-aggregated call graph can be access through TraceReporter::GetEventTree.

Performance Overhead

Adding trace instrumentation macros has the following overhead:

When tracing is disabled, TRACE_FUNCTION(), TRACE_FUNCTION_SCOPE(), and TRACE_SCOPE() initialize 16 bytes of stack memory, read an atomic int, and have 2 branches.
When tracing is enabled, overhead of TRACE_FUNCTION(), TRACE_FUNCTION_SCOPE(), and TRACE_SCOPE() is about 100 times larger than the disabled case. (.33ns vs 33ns in microbenchmarks on our workstations).
The dynamic versions of the macros TRACE_FUNCTION_DYNAMIC(), TRACE_SCOPE_DYNAMIC() have a much higher overhead than the static versions. The reason for this is that for the static versions, the names of the scopes are compiled as constexpr data, but the dynamic versions construct strings at runtime. This overhead of dynamic macros is true whether tracing is enabled or not. Because of this, the static versions should be preferred whenever possible.

It is possible to disable TRACE macros from generating code by defining TRACE_DISABLE in the preprocessor.

The TraceCollector class and TRACE macros are thread-safe.

Details

For more detailed information see Trace Details

Contents

Overview

Instrumentation

Recording and Reporting

Performance Overhead

Details