Trace: Performance tracking

Contents

Overview

Defines utility classes for counting, timing, measuring, and recording events. The TraceCollector class records TraceEvent objects. The TraceReporter class contains functions to generate reports on events gathered by the TraceCollector object.

Instrumentation

Instrumentation is done by adding TRACE macros to code.

For example:

// Typical usage.
void Foo()
{
...
{
// More work is done here that needs timing separated.
TRACE_SCOPE("Inner Scope");
...
}
}

Python tracing is also supported:

from pxr import Trace
@Trace.TraceFunction
def foo():
...
with Trace.TraceScope("Inner Scope"):
...

Adding Trace macros does have a small overhead even when tracing is disabled. Sometimes a performance sensitive function may have a slow path that is taken infrequently, but timing information is needed. In cases like this, it is possible to reduce the overhead of the instrumentation to specific scopes.

// This is an example of a performance sensitive function that has a slow path.
void PerformanceSensitiveFunction(bool slowPath)
{
// No TRACE Macro is used in the fast path to avoid any overhead.
if (slowPath) {
// This will only pay the overhead cost of the trace instrumentation on
// the slow path.
TRACE_FUNCTION_SCOPE("Slow path");
...
}
...
}

Recording and Reporting

Recording is done through the TraceCollector class. Enabling the collector will cause TRACE macros to record events. Reports are generated with the TraceReporter class.

int main(int argc, char* argv[])
{
// Start recording events.
Foo();
...
// Stop recording events.
// Print the ASCII report
ofstream reportFile("report.trace");
TraceReporter::GetGlobalReporter().Report(reportFile);
// Print the chrome-compatible report
ofstream chromeReportFile("report.json");
TraceReporter::GetGlobalReporter().ReportChromeTracing(chromeReportFile);
}

And in Python:

from pxr import Trace
# Start recording events.
Trace.Collector().enabled = True
Foo()
...
# Stop recording events.
Trace.Collector().enabled = False
# Print the ASCII report
Trace.Reporter.globalReporter.Report("report.trace")
# Print the chrome-compatible report
Trace.Reporter.globalReporter.ReportChromeTracingToFile("report.json")

The default report is an aggregated call graph showing the include time, exclusive time, and call count of each scope.

Example Report:

Tree view ==============
inclusive exclusive
13.000 ms 1 samples MainThread
13.000 ms 4.000 ms 1 samples | OuterScope
9.000 ms 8.000 ms 1 samples | InnerScope
1.000 ms 1.000 ms 2 samples | | Inner Scope 2
13.000 ms 1 samples Thread 1
13.000 ms 4.000 ms 1 samples | OuterScope
9.000 ms 8.000 ms 1 samples | InnerScope
1.000 ms 1.000 ms 2 samples | | Inner Scope 2

The Chrome tracing format is also supported using the TraceReporter::ReportChromeTracing method.

A report can also be generated from a program instrumented with libtrace using the PXR_ENABLE_GLOBAL_TRACE environment variable. If this variable is set, the TraceCollector singleton will start recording on initialization, and a report will be written to stdout at program exit.

>env PXR_ENABLE_GLOBAL_TRACE=1 usdview HelloWorld.usda --quitAfterStartup
Tree view ==============
inclusive exclusive
358.500 ms 1 samples Main Thread
0.701 ms 0.701 ms 8 samples | SdfPath::_InitWithString
0.003 ms 0.003 ms 2 samples | {anonymous}::VtDictionaryToPython::convert
275.580 ms 275.580 ms 3 samples | PlugPlugin::_Load
0.014 ms 0.014 ms 3 samples | UcGetCurrentUnit
1.470 ms 0.002 ms 1 samples | UcIsKnownUnit
1.467 ms 0.026 ms 1 samples | Uc::_InitUnitData [initialization]
1.442 ms 1.442 ms 1 samples | | Uc_Engine::GetValue
0.750 ms 0.000 ms 1 samples | UcGetValue
0.750 ms 0.750 ms 1 samples | Uc_Engine::GetValue
9.141 ms 0.053 ms 1 samples | PrCreatePathResolverForUnit
0.002 ms 0.002 ms 6 samples | UcIsKnownUnit
...

The aggregated call graph can be accessed through TraceReporter::GetAggregateTreeRoot. The non-aggregated call graph can be access through TraceReporter::GetEventTree.

Performance Overhead

Adding trace instrumentation macros has the following overhead:

  • When tracing is disabled, TRACE_FUNCTION(), TRACE_FUNCTION_SCOPE(), and TRACE_SCOPE() initialize 16 bytes of stack memory, read an atomic int, and have 2 branches.
  • When tracing is enabled, overhead of TRACE_FUNCTION(), TRACE_FUNCTION_SCOPE(), and TRACE_SCOPE() is about 100 times larger than the disabled case. (.33ns vs 33ns in microbenchmarks on our workstations).
  • The dynamic versions of the macros TRACE_FUNCTION_DYNAMIC(), TRACE_SCOPE_DYNAMIC() have a much higher overhead than the static versions. The reason for this is that for the static versions, the names of the scopes are compiled as constexpr data, but the dynamic versions construct strings at runtime. This overhead of dynamic macros is true whether tracing is enabled or not. Because of this, the static versions should be preferred whenever possible.

It is possible to disable TRACE macros from generating code by defining TRACE_DISABLE in the preprocessor.

The TraceCollector class and TRACE macros are thread-safe.

Details

For more detailed information see Trace Details