The following is a brief specification of the main Universal Scene Description features. This document began as an internal memo attempting to explain the overall projected functionality of USD and why those features are needed, motivated in part by comparing the two internal scene description systems (Menva and TidScene) that inspired USD. We have attempted to remove the comparisons and speculation, leaving just the features currently in our working USD implementation. Nevertheless, if something seems confusing or out-of-place, please let us know in the usd-interest forum so that we can clarify/correct.

This document focusses primarily on the composition aspects of USD, as they may be the least familiar. Forthcoming documents will discuss the mechanics of the USD scenegraph, and specifics of the geometry and shading schemas.

Table of Contents:

Problem Statement

To be able to leverage industry innovations quickly and economically, Pixar needs to be able to incorporate new applications into our pipeline easily, in a way that complements our existing toolset (including our proprietary rigging/animation system, Presto), rather than making the pipeline more complicated.

Achieving this goal is confounded by a number of factors. One of the primary factors is that our architecture for inter-application dataflow is a "many-to-many" format converter that results in the production (and storage) of many different representations of the same data. We have, with our transition to a pose-cached pipeline, begun to centralize more data in a single, cross-package format (TidScene), but that has gotten us only so far. The key remaining problems that we need to solve are the following:

  • We don’t have a good data model to represent all workflows. Many of the workflows already have good data models, with powerful layering and overrides, but the data models are different enough that we spend too much time and mental effort transforming data in order to communicate results between workflows and packages.
  • The various asset and shot files define the runtime that consumes them (e.g. Maya files only consumable by Maya, Menva files only consumable by Presto).
  • We rely on complicated build processes to produce all the data representations we require (as dictated by the applications we plan to use in the pipeline), and keep them in sync.
  • Even given the extensive build process, not all of an asset's data is made available to all applications (for a variety of reasons), which means that only a limited set of data can be overridden in sequence/shot contexts
  • Our pipeline has always embraced the idea of building variation into assets as a first-class part of an asset's interface/API, yet with the deployment of our new animation system, variation is "live" and accessible only until it leaves the animation system.
  • Due to the last two points, an artist needs to predict the uses of an asset during its construction, because downstream consumers cannot uniformly modify its uses in shot contexts


At a high level, the Universal Scene Description project aims to:

  • Provide a single representation (data model) of models and shots (geometry, shading, etc.) that can be consumed by any application.
  • Provide rich ways of communicating data between applications, enabling layered, non-destructive editing of data passing through an application, as well as controlling how "flat" we want a more traditional "baking" export to be.
  • Define strict protocols for that communication that can be enforced in all pipeline applications.
  • Allow for pieces of group assets (sets) and shots to be updated without needing to regenerate the entire collection
  • Provide a representation that is fast, scales well to large scenes (both in application memory, and in total disk storage required for all assets and shots in a production, which is a strong indicator of network bandwidth consumption and network-cache efficiency), and is easy to debug.

Production priorities:

  1. Stability
  2. Performance
  3. Simplicity

Why is USD not Alembic?

At the onset of the project, we considered whether Alembic or either of the two existing scene description systems currently in use at Pixar could serve as the basis for all of our pipeline scene data.

It quickly became clear that referencing operators and the non-destructive editing capabilities they provide are vital to achieving the scalability and incremental-update goals described above. While Alembic provides a good solution for representing flat, baked/animated scene description, because it has no facility for file-referencing or sparse overrides, it cannot be our unified basis for pipeline data.

This does not preclude a future in which Alembic and USD merge into a single entity. Until that time, native Alembic files can most definitely serve as the inputs to the referencing operators in USD - that is, in any graph of referenced files in a USD scene, any leaf node can be an Alembic file.

What data would be included in this format?

Given the goals above, the data represented in this file should be consumable in a majority of the applications in our pipeline. Data that falls outside of this description should be stored in the format most appropriate for the application that consumes it (e.g. textures, shader definitions), however the presence of that data should still be recorded in USD (i.e. a pointer off to the native file) so that the USD asset description remains the definitive description of the asset.

Terminology: What is Composition?

Throughout this document we will refer to composed scene description, composition features, and the act of composing opinions or other structured units of scene description. Like many terms in computer graphics, "composition" is already a general and broadly-used term. In USD, however, we consider composition to be the generalization of "file referencing" and "layering". Composition behavior follows a strict set of rules contained in one of the core USD modules, and in this document we will provide an overview of the features that those behaviors enable.

Motivation: Portable Pipeline Data

The sections below are an attempt to briefly classify objects in the Pixar pipeline as they pertain to the goals of this project. The text below describes a baseline representation of asset structures that are appealing to pass through the entire pipeline and make available to every application.


  • Contain geometric, shader and texture data, and other schemified data.
  • May have multiple representations (geometric variations, shading variations, LODs etc)
  • May be composed (via referencing) of other Models (i.e. sets or other "model group" structures)
  • May refer to rigging files for consumption by specific applications, but the data contained therein is not part of the composed model.


  • Are composed of Models
  • Contain time-sampled animation
  • Contain layered FX edits & animation
  • Contain camera information
  • Contain lighting & compositing information (possibly)

Configuration files:

  • Consist of hierarchical, key-values pairs
  • Are layered across the entire studio, per unit/film, per prod/sequence and per model

At first glance, configuration appears to be a completely orthogonal problem, however there are several systems in place at Pixar to compose configuration data in a way that can be completely described as "references with overrides", and that data needs to be accessed throughout the pipeline. It is included here for discussion and acknowledgement, rather than as a constraint or requirement.

Data Storage Features

Value Representation

USD should support the storage of time varying, hierarchical, key-value pairs of data. The data stored should be strongly typed and enable the creation of domain-specific schema (geometry, shading, etc).

Based on the constraints of performance and portability, the time varying data should be point-sampled and not spline based. The reasons are that:

  • it provides the most consistent behavior between applications
  • it has a performance advantage in that the spline doesn't need to be evaluated to determine a value at a given time
  • it has a simpler representation: no knots are needed, only the per-frame values
  • it seems beyond the scope of an interchange system to prescribe how, for example, matrices should be interpolated.

Interpolation of the sampled data is left to consuming applications; although the USD core should provide complete introspectability into the time-samples for any given attribute, interpolation is not desirable as a core feature.

Data Aggregation and Object Model

Typed, sampled data is stored on properties. Properties are grouped together into prims (for primitive). Prims can, in addition to properties, contain other prims, allowing us to build namespace hierarchies representing models, shots, etc. Prims can also provide schemas that prescribe the meaningful properties and types for the properties.

Both prims and properties are also able to host metadata, which is data about the prim or property that cannot vary over time; for example, an attribute's type and documentation are both encoded as metadata, as is a prim's schema type.

Finally, the outermost object that contains all of the composed prims in a scene is called a stage. The stage owns the prim scenegraph, and provides lifetime and authoring management of the files that contribute to the scene's definition.

ASCII vs. Binary

Given our desire for simply-described data and the ability to easily debug problems, an ASCII representation makes sense for storing references, variation and small data values. On the other hand, given the requirements of scalable, high performance data streaming, a random-access, binary encoding will be required for large data. When consulting production, the common preference was for performance over ease of use, but if both were possible, that too was desired.

USD contains a flexible FileFormat plugin system that allows arbitrary file formats to be parsed, dynamically translated (if required), and composed into USD. USD will always ship with (at minimum) a complete, stable, ascii representation, as well as an efficient binary representation that may evolve over time. We have found the ascii representation to be tremendously valuable for debugging, and plan to also use it for archiving "legacy" assets such that current software will always be able to parse the assets without needing to maintain support for deprecated binary back-ends.

Composition Features


Layering is the simplest and most basic composition feature. Layering for scene description is similar in concept to layering in PhotoShop: we can provide an ordered list of input layers to the composition system, which will give back to us a "merged" view of the data in all the layers. However, whereas in PhotoShop, there are a huge number of ways in which the pixel values for corresponding locations in each layer can be combined, USD supports only a small number of combine operators for scene data (not due to any inherent limitation, but because it helps maintain understandability of composed scenes). Except for one or two special elements of scene description (one of which is List Ops, discussed below), the vast majority of data composed in layers subscribes to the simple "strongest layer's opinion wins" combine operator.

Typically, a "top level" layer will specify an ordered list of sublayers that will be composed to form the content of the top-level layer. All references to that top-level layer will (via dynamic composition) include all of the sublayers' content. We sometimes refer to a layer and its (recursive closure of) sublayers as a layer stack.

The above image demonstrates how layers can be used to organize scene description by department and workflow. The Shot_Layout.usd layer contains all of the characters in the shot, and was created and owned by the layout department. The Shot_Sets.usd layer was created and owned by the sets department. Each department can continue to work and checkpoint their progress without interfering with the other's data.


  • Segmenting and organizing sequence and shot data between different departments
  • Organizing model internals - a model may have a geometry layer, a shading layer, and a rigging layer
  • Layering FX animation
  • Layering FX subdiv edits
  • Layering simulated animation, such as keep-alive noise for trees
  • Lighting overrides


A prim on a stage can be in one of two states, active or deactivated. When a prim is deactivated, its subgraph is pruned from composition, and the prim itself does not participate in most scenegraph behaviors (e.g. it will not be listed as one of its parent's children by default). Since the active state is a field that gets composed like any other value, a prim can be both deactivated and/or re-activated in layers much higher in the referencing chain than where the prim is defined.


  • Debugging: it is much faster to render a single character or prop than an entire scene in which it exists. Activation allows a user to author an opinion that a subgraph should not be considered as part of the scene, without needing to regenerate the pruned scene completely.
  • Recombination of assets: In some cases, it is beneficial to reference an entire asset and deactivate branches that are not wanted, rather than authoring many references to pick out the desired branches.
  • Encapsulating variation: (skipping slightly ahead) if the differences between several variations of a model are largely the presence of various bits of geometry (e.g. cupWithHandle, cupWithoutHandle), we can define all geometry in a single layer (which makes applying shading easier, for instance) and add a "VariantSet" that adds a deactivation opinion for the handle in one of the variants.

Referencing with Overrides

After Layering, the next most fundamental composition mechanism is a Reference. It allows scene description to be instantiated from an external layer without copying the contents into the referencing layer. This is analogous to header includes in C++, including their recursive nature.

A reference consists of the following:

  • The AssetPath identifying the external file to reference
  • A local scene path at which to place the referenced scene description (e.g. /World/anim/chars/Buzz)
  • A remote scene path (in the external file) from which to extract the information
  • A time offset and scale to be applied to all time-varying data extracted from the external file, for animation retiming

Once the scene is composed, the local scene path where the reference was authored replaces the remote path name, and the referenced subgraph appears as if it was authored at the reference point. The image below shows an example of a simple reference with no overrides:

This example shows a shot referencing a model, but references can also be used within a model to reference other models, as, for example, in a set.

"List Ops" and List Editing of References

Any prim can reference arbitrarily many layers, and the relative strength of the referenced layers follows the ordering of the AssetPaths in the reference list. Given that "references" are actually lists, USD provides "List Op" operators, expressible in layers, for manipulating a prim's references-list. In any sublayer of a layerStack, at a given prim's namespace location, we can add, delete, or reorder any of the references expressed on the prim; these list-editing operations are applied in reverse-order to the sublayer strength ordering.

Overriding referenced values

If a subgraph in the referencing layer overlays prims that were defined in the referenced layer, the values will be composed, first checking the referencing layer and then the referenced layer for values. Below is an example where points are overridden:


  • Set construction
  • "Group" model construction – at Pixar we publish "character group" models that reference in a character and all of its clothing, prop, and ancillary models
  • Shot construction – a pose-cache references the individual models contained within the shot, overriding only the attributes that have been modified in the shot context. At Pixar we choose to allow these references to remain "live" in order to pick up changes to assets without needing to re-bake the pose-cache. This is not a requirement; references can be localized to a pose-cache in a variety of ways, along a spectrum of tradeoffs between locality and filesize.
  • Incremental pose caching – allows for individual models in a shot to be cached while a shot is being animated


Often, a model's geometry or shading will have multiple looks or variations. Rather than creating individual copies of the model to represent each combination of each axis of variation, a variant can be explicitly declared in the scene description and a specific variation can be selected and composed on demand. Variation in USD is declared using a VariantSet, which defines multiple Variants, each Variant provides a unique view of the world. One (and only one) Variant can be selected per VariantSet by specifying a Variant Selection opinion.

  • Variant Set  – The name for a group of variants. For example, for a coffee mug that can either have a handle or not have a handle, one might create a Variant Set  called "Handle", which would contain Variants "withHandle" and "noHandle".
  • Variant – A variant provides one view of the scene, such as the coffee mug with the handle present. Each Variant in a VariantSet is free to override or create scene description in any part of the namespace rooted at the VariantSet - the variants need not have anything in common, although they typically do, in Pixar's common use.
  • Variant Selection – Each variant set has one selection, which will enable a specific variant for composition.

Variants are Combinatorial

The true power of variants is that we can define multiple VariantSets on the same prim (typically the root prims of models, in Pixar's pipeline), and the results of the selections of each variant combine uniformly in well-understood ways. Multiple VariantSets can be siblings of each other, in which case the order in which they are arranged in scene description determines their relative strength (to resolve cases where multiple VariantSets provide opinions for the same properties). But VariantSets can also be nested; for example, since "modeling variant" and "level of detail variant" are not orthogonal in their effect on the model, we may nest the LODVariant VariantSet inside the ModelingVariant VariantSet.

Examples of variation are shown at the end of this document, in the Pipeline Data Examples section.

Classes and Inheritance

Although tools like CEL expressions in Katana allow us to make sweeping, pattern-based edits to many properties with a small amount of specification, we have found it very useful to be able to express such edits as scene description, so that the edits remain live and modifiable (and overridable!) as data flows down the pipeline.

In USD, any prim can inherit from one or more class prims, whose namespace topology and property values will be inherited by the prim. A class prim is a special kind of prim only in that it is "abstract" and never considered for rendering; it can have any number and type of prim children, and any kind of property or composition operator defined on it. If the class prim does define children, those children will be instantiated as children of every prim that inherits from the class.

USD inheritance vs OOP inheritance

Technical artists and engineers familiar with object-oriented languages such as C++ may be wondering how classes in USD relate to classes in programming languages. In C++, when class Derived derives from class General, or we interact with an object instance of type General, we are inheriting the behavior of class General, and class Derived is able to override the inherited behaviors.

Although clients of USD may vary their interpretation of the data stored in prims based on the resolved typename metadata of the prim (which determines the schema to which it subscribes), the scene description contained in USD prescribes no behaviors other than the generic composition behaviors we are enumerating in this section. Therefore, class inheritance in USD is all and only about inheriting structured data. Classes provide a way of concisely organizing data that is applicable to many instances of a "class" of scene description. A "fully baking" export process could choose to flatten out data from classes onto each instance of the class, although since we can alternately "localize" all class opinions within their respective classes in the export file, there would be little advantage to fully flattening the classes, and a potential increase in filesize.

A very useful property of classes is that they can be defined and "packaged" with leaf assets in a deeply-nested referencing structure, and yet still appear and be overridable at the root of the referencing structure. For example, as in the diagram below, we can declare that the "Book" asset inherits from "_class_Book" (which may itself inherit from "_class_Prop", which inherits from "_class_Model" - at Pixar the typical class hierarchy is four deep). In the Shot_Sets.usd layer, we have referenced the "Book" asset to create three instances of the book and we provide a color for the "BookCover" prim that will be inherited by all instances except "Book3", because we expressed a color opinion for it directly.

The last point is worth calling out, because the ability to identify and preserve (as data flows from application to application) exceptions to the group-edits contained in classes is something not easily accomplished with pattern-matching-based group edits.

Model Hierarchy

We have already discussed the categorization of assets in the pipeline to include models and model groups. USD allows you to extend this categorization (via a customizable type hierarchy), but these two "core" categorizations are important, because they define the model hierarchy of any aggregate, sequence, or shot.

The primary reasons for supporting the notion of model hierarchy are the understanding that there are many common and important tasks that can be performed on just the model "interfaces" as represented by the root prims of the models, and that it is increasingly rare that an entire shot can be loaded into memory at the same time. Model Hierarchy gives us a natural and convenient granularity for managing the "working set" of the scene upon which we need to operate.

Stage Population and Payloads

When a scene is opened on a stage, we can choose to populate and compose ("load") the entire scene up front, or just load the model hierarchy, which can typically be done very quickly, consuming little memory. When we open the stage "unloaded", we then have the freedom to load models individiually or in groups. The concept of model hierarchy, and the ability to introspect into the model hierarchy with very low latency are very important in Pixar's pipeline. To make model hierarchy as efficient and low-latency as possible, we introduced a composition feature called a "payload". A payload is a specialization of a reference, but its target is ignored during initial Stage population; from the model hierarchy presentation of a Stage, one must explicitly load a model to compose the prims underneath a model's root, after also pulling in all the scene description targetted by the model prim's payload. Like all composition features, authoring payloads into one's models is completely optional in a pipeline. The sole consequence of not using payloads is that model hierarchy becomes potentially much higher latency and heavier, because we must open every file that contributes to any referenced asset's scene description, just to compose the root prim of each model instance. Whereas, with payloads, model hierarchy can be populated consulting only one very small "asset interface" file per model.


  • When editing scenes and it is known that a large portion of the graph is needed to be in memory, it is preferable to explicitly load the scene in a user's session so the user can do other work while it loads and avoid hitting many small pauses if (alternatively) the scene were to load lazily.
  • In cases where the user wants to read a small subset of the overall scene, it makes sense to avoid loading anything other than the desired subgraph(s).
  • Reduces latency in offline, multi-threaded renders. We can very quickly discover the model hierarchy, which can then be used to partition work among threads, with each thread loading only the assets it requires for its assigned task, in parallel.
  • Deploying payloads in models gives us a way to economically separate the model's contents from its interface. The interface becomes the "top level" file for referencing the model into aggregates, and contains only the information clients find useful prior to loading the (heavyweight, potentially spanning multiple files) contents of the model. The types of things we put into our interface files are:
    • Declaration of the model's variantSets and their allowable variants, so that they may be selected prior to loading the model.
    • Declaration of the class inheritance structure of the model, so that it applies to the model's root prim in the same way regardless of whether the payload is loaded or not.
    • A "representative" bounding box, as a hint for clients that wish to provide a rough, not-guaranteed-to-be-accurate spatial approximation of the model prior to loading it.
    • Of course, the "kind" metadata that identifies the prim as a model

Namespace Ordering

If enabled, namespace ordering preserves the order in which prims and properties were authored. By default, USD preserves namespace ordering for prims, but properties and metadata are sorted in alphabetical order when queried, to improve composition performance.


A relationship defines an association and allows a prim to express interest in another prim or property (the targets) in the scene graph. For example, a subdivision surface could use a relationship to indicate which shader to use at render time. The image below shows this scenario, in which the "shader" property on the "Visor" prim is a relationship to the "Glass" shader:

Although USD does not currently support it in its object model, the underlying data model on which USD is based supports the database-inspired concept of "relational attributes." Each target of a relationship may itself have attributes that will "stick" to their targets as targets are added, removed, or reordered on the relationship. The attributes and targets of the relationship are composed using the list editing and namespace ordering composition rules described previously. If compelling use cases arise, we can add API support for relational attributes, but would prefer to avoid the complexity they introduce until the need is clear.

An example of a relational attribute might be in how we "weight" the contribution of coshaders to a surface shader. The surface identifies the coshaders it will consume by constructing a relationship that targets all of the coshaders. Each target arc of the relationship can "carry" a "contribution" attribute that specifies the weighting of each coshader to the surface. Like references, relationship targets can be "list edited"; so, given a relationship with four targets (each with "contribution"), we can, in an editing application, remove the second target from the relationship, and because the "contribution" attributes stick to the target arcs, the third and fourth targets stay in sync with their contributions. If instead the contributions were encoded as a separate array whose length was the same as the number of targets in the relationship, then all application editing code would be responsible for keeping the array(s) in sync with their associated relationship whenever the relationship is edited.

Note that relationships do not imply any composition of the data stored at the source or the target and does not cause the composition system to change the namespace topology of the scene. A relationship is analogous to a pointer, which must be interpreted by the consumer, whereas a reference actually composes the data of the two graphs and alters the scene topology.

Value Clips

For any given prim, we may specify an additional time-varying asset-path that names source files that will be used for value resolution of attributes in the namespace rooted at the prim.  This allows animation with common scene topology to be sequenced over time at a given location in namespace. The time-varying "clips" can be retimed and looped at the point of application (and in stronger overrides).


  • Animation of crowds. Character animation is baked into clips which are then sequenced together using animated value references.
  • File-per-frame "big data" for FX. The results of some simulations and other types of sequentially-generated special effects generate so much data that it is most practical for the simulator to write out each time-step or frame's worth of data into a different file. USD Clips make it possible to stitch all of these files together into a continuous (even though the data may itself be topologically varying over time) animation, without needing to move, merge, or perturb the files that the simulator produced. The USD toolset includes a utility usdstitchclips that efficiently assembles a sequence of file-per-frame layers into a Value Clips representation.

Pipeline Data Examples


A Model is a standalone entity in the pipeline representing a prop, character, hair, garment, or architecture for example. Upon initialization, three USD files are created, one for the model definition, one for the geometry and one for the public shader interface, as depicted below.

The references in this structure allow the core definition, the geometry or the shader to be individually replaced or regenerated in isolation.

The core definition is an ASCII file, allowing for easy editing of additional references or meta data. For performance, the geometry USD file is binary and is created via a modeling package, such as Maya or Modo. It is generated by periodically executing an export command, which replaces the existing geometry USD file. The shader interface is stored as ASCII and can either be constructed in the same application as the geometry or separately. Like geometry, when the export command is executed, the shader USD file is replaced.

A model may have geometric or shader variants. Those variants can be expressed using the variant composition feature. For example, if Buzz has two representations, a version with the glass visor and one without the visor, the structure can be expressed as follows with a variant set called "Visor" that has two possible states, "WithVisor" and "WithoutVisor":

When the model is referenced, a variant selection can be authored and switched to select either the desired representation.

Efficiency Note:
Although (for diagrammatic clarity) it is not depicted above, it is advantageous to expose all of a model's selectable variantSets on the model's root prim. By declaring variation at the root of the model, the model hierarchy of a shot can be traversed and the interface for each model can be exposed efficiently with minimal composition.

Aggregate Models (Sets and groups)

An aggregate model is authored by referencing existing models, positioning them (rotate, translate, scale), duplicating them or by doing more complicated instancing, such as with point clouds or simulators. When the set dressing or model editing software exports the aggregate model, the following USD structure is populated:


A shot is represented much like an aggregate model, however it's core definition has different meta data (such as FPS) and it will contain one or more cameras in addition to model references. Up until now, we've only been discussing static data for models, but shots contain baked animation. Finally, shots need additional layers so that multiple departments can contribute their work without synchronization or full baking.

The following diagram depicts a shot with a layered structure which allows for animation and FX to work independently.


Schema Note

We have not yet decided the level of schema support that USD will provide for renderer-specific "encapsulated proceduralism" (such as RenderMan's RiProcedural) or for lights. It is certainly convenient and compelling to be able to embed "practical lights" (e.g. the light-sources in street-lamps), instancing-systems used in vegetation, and built-in special effects (e.g. a robot character with a surging electric arc between its antennae) directly into models. We are still considering the ramifications on pipeline dataflow, however.