Observability redesign to reduce dependencies and improve flexibility #379

mpenn · 2025-06-16T19:50:11Z

Description

Closes: #378

This PR introduces a complete overhaul of the NeMo Agent toolkit's observability infrastructure, providing a robust, scalable, and extensible framework for monitoring and tracing workflow execution.

Key Improvements

Backwards Compatibility: Maintains full configuration backward compatibility while adding significant improvements in extensibility, however the exporter implementation interface has changed.
Modular Architecture: Plugin-based system supporting multiple telemetry backends
Reduced Dependencies: Core observability no longer requires OpenTelemetry - now available as optional plugin
High Performance: Copy-on-write isolation enables efficient concurrent execution
Extensible: Type-safe processor pipelines for flexible data transformation
Multiple Backends: Built-in support for OpenTelemetry, Phoenix, and Weave

New Architecture

Class/Plugin Hierarchy

Exporter (abstract interface)
└── BaseExporter (abstract)
    └── ProcessingExporter (abstract)
        ├── SpanExporter (abstract)
        │   ├── OtelSpanExporter (abstract)
        │   │   ├── PhoenixOtelExporter (concrete)
        │   │   │   └── phoenix (plugin)
        │   │   ├── OTLPSpanAdapterExporter (concrete)
        │   │   │   ├── langfuse (plugin)
        │   │   │   ├── langsmith (plugin)
        │   │   │   ├── otelcollector (plugin)
        │   │   │   ├── patronus (plugin)
        │   │   │   └── galileo (plugin)
        │   │   └── RagaAICatalystExporter (concrete)
        │   │       └── catalyst (plugin)
        │   └── WeaveExporter (concrete)
        │       └── weave (plugin)
        └── RawExporter (abstract)
            └── FileExporter (concrete)
                └── file (plugin)

Core Components

ExporterManager

Manages exporter lifecycle and concurrent execution
Copy-on-write isolation for concurrent workflows
Automatic cleanup and resource management

Processing Pipeline

Processing pipelines enable flexible data transformation chains where events can be modified, enriched, or reformatted before export. This allows exporters to adapt data for different backends without duplicating transformation logic.

Processor[InputT, OutputT] generic transformation interface
BatchingProcessor for efficient dynamic batching of traces
Type-safe processor chains

Copy-on-Write Isolation

ExporterManager
├── Original Exporters (Shared Registry)
│   ├── Shared: HTTP Clients, Auth, Configuration
│   └── create_isolated_exporters()
│       ├── Workflow 1 → Isolated Instance (private tasks, events, buffers)
│       ├── Workflow 2 → Isolated Instance (private tasks, events, buffers)
│       └── Workflow N → Isolated Instance (private tasks, events, buffers)
└── Benefits: Fast (shared resources) + Safe (isolated state)

Performance Benefits

Fast: Shares expensive resources (HTTP clients, auth)
Safe: Isolates mutable state (tasks, events, buffers)
Memory Efficient: Minimal overhead for isolation
Concurrent: Enables parallel workflow execution

Core Plugin Exporters

FileExporter: Local file export for debugging

Plugin Exporters

OpenTelemetry: Standard OTEL span export with OTLP adapter
Phoenix: AI observability platform integration
Weave: Weights & Biases integration

Integration

The system integrates seamlessly with existing NAT runtime:

# In AIQRunner
async with self._exporter_manager.start(context_state=self._context_state):
    result = await self._entry_fn.ainvoke(self._input_message, to_type=to_type)

# In WorkflowBuilder
for key, exporter_config in telemetry_config.tracing.items():
    await self.add_exporter(key, exporter_config)

Also included are minor bug fixes:

Fixing Function.astream to set final output of ActiveFunctionContextManager for streaming requests
Fixing Function.astream to use function runtime instance name to remove ambiguity in traces (previously the config type was used)

By Submitting this PR I confirm:

I am familiar with the Contributing Guidelines.
We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
- Any contribution which contains commits that are not Signed-Off will not be accepted.
When the PR is ready for review, new or existing tests cover these changes.
When the PR is ready for review, the documentation is up to date with these changes.

- Introduced new subpackages: `aiqtoolkit-opentelemetry` and `aiqtoolkit-phoenix` for enhanced observability and integration with external services. - Updated `pyproject.toml` and `uv.lock` to include dependencies for the new packages. - Implemented telemetry exporters for both OpenTelemetry and Phoenix, allowing for flexible trace exporting. - Added necessary classes and methods for span management and exporting in the observability module. This update enhances the toolkit's capabilities for monitoring and observability, providing users with more options for telemetry integration. Signed-off-by: Matthew Penn <mpenn@nvidia.com>

Signed-off-by: Matthew Penn <mpenn@nvidia.com>

… manager context manager. Signed-off-by: Matthew Penn <mpenn@nvidia.com>

… type check in StepAdaptor process method Signed-off-by: Matthew Penn <mpenn@nvidia.com>

exporters and removed deprecated of main pyproject.toml Signed-off-by: Matthew Penn <mpenn@nvidia.com>

phoenix Signed-off-by: Matthew Penn <mpenn@nvidia.com>

step output Signed-off-by: Matthew Penn <mpenn@nvidia.com>

Signed-off-by: Matthew Penn <mpenn@nvidia.com>

…ity-redesign

Signed-off-by: Matthew Penn <mpenn@nvidia.com>

mdemoret-nv

I think we are close but there are a few tweaks needed:

The base exporter should subscribe directly to the event stream or have an abstract method which just accepts raw IntermediateStep objects
We need to discuss the lifetime of exporters. Im concerned we may be creating/destroying them too frequently.
The ExporterRegistry I am not sure about. We can use factory functions in the registration if needed

src/aiq/builder/function.py

src/aiq/builder/workflow_builder.py

src/aiq/runtime/runner.py

src/aiq/observability/exporter_manager.py

src/aiq/observability/base_exporter.py

src/aiq/observability/exporter_registry.py

src/aiq/observability/base_exporter.py

Signed-off-by: Matthew Penn <mpenn@nvidia.com>

…ity-redesign Signed-off-by: Matthew Penn <mpenn@nvidia.com>

Signed-off-by: Matthew Penn <mpenn@nvidia.com>

implementations, removal of ExporterRegistry, and moving on_complete within ExporterManager.start context manager. Signed-off-by: Matthew Penn <mpenn@nvidia.com>

Signed-off-by: Matthew Penn <mpenn@nvidia.com>

updated tests Signed-off-by: Matthew Penn <mpenn@nvidia.com>

lighter task scheduling Signed-off-by: Matthew Penn <mpenn@nvidia.com>

Signed-off-by: Matthew Penn <mpenn@nvidia.com>

Updated the test to check that the "exporter1" instance is not only present but also a subclass of BaseExporter, improving type safety in the telemetry exporter validation. Signed-off-by: Matthew Penn <mpenn@nvidia.com>

workflow observability index.md re: aiq info -t, was missing components command Signed-off-by: Matthew Penn <mpenn@nvidia.com>

…ity-redesign Signed-off-by: Michael Demoret <mdemoret@nvidia.com>

Signed-off-by: Michael Demoret <mdemoret@nvidia.com>

mdemoret-nv

Really nice piece of code. Im impressed. Couple of things general statements:

Getting type incompatibilities on uses of register_telemetry_exporter. Likely need to update the decorator types with the new registrations
Is this backwards compatible? If not, we need to properly mark and document

src/aiq/builder/workflow_builder.py

src/aiq/observability/register.py

src/aiq/data_models/span.py

src/aiq/front_ends/fastapi/step_adaptor.py

src/aiq/observability/register.py

packages/aiqtoolkit_opentelemetry/src/aiq/plugins/opentelemetry/register.py

Updated the TelemetryExporterBuildCallableT type to reference BaseExporter instead of SpanExporter. Removed the OpenTelemetry import block as it is no longer necessary. Signed-off-by: Matthew Penn <mpenn@nvidia.com>

…io.gather with a simple for loop. Signed-off-by: Matthew Penn <mpenn@nvidia.com>

…Agent-Toolkit into mpenn_observability-redesign

willkill07

Very minor feedback related to UUIDs. Feel free to ignore.

src/aiq/data_models/span.py

src/aiq/observability/exporter/span_exporter.py

packages/aiqtoolkit_opentelemetry/src/aiq/plugins/opentelemetry/otel_span.py

…s for telemetry exporters. Updated the Langfuse, Langsmith, OtelCollector, Patronus, Galileo, Phoenix, and Catalyst telemetry exporters to inherit from BatchTelemetryConfigMixin, consolidating batch size control options into a single mixin. Removed redundant batch size control fields from individual exporters. Signed-off-by: Matthew Penn <mpenn@nvidia.com>

…ors. Signed-off-by: Matthew Penn <mpenn@nvidia.com>

…ion for UUID. Updated SpanExporter to use span_id factory method. Signed-off-by: Matthew Penn <mpenn@nvidia.com>

Signed-off-by: Matthew Penn <mpenn@nvidia.com>

…ed file management features - Renamed `FileTelemetryExporter` to `FileTelemetryExporterConfig` and updated its configuration fields to include `output_path`, `mode`, `enable_rolling`, `max_file_size`, `max_files`, and `cleanup_on_init`. - Introduced `FileMode` enum for file write modes (append/overwrite). - Updated the `file_telemetry_exporter` function to yield a `FileExporter` instance with the new configuration. - Renamed `ConsoleLoggingMethod` to `ConsoleLoggingMethodConfig` and updated its registration accordingly. - Added `FileExportMixin` enhancements to manage file path conflicts and rolling log files, including methods for file cleanup and resource conflict detection. This refactor improves the flexibility and robustness of file-based telemetry logging, allowing for better management of log files and preventing resource conflicts. Signed-off-by: Matthew Penn <mpenn@nvidia.com>

Signed-off-by: Matthew Penn <mpenn@nvidia.com>

types (including planned event types) Signed-off-by: Matthew Penn <mpenn@nvidia.com>

Signed-off-by: Matthew Penn <mpenn@nvidia.com>

- Replaced `BatchTelemetryConfigMixin` with `BatchConfigMixin` and introduced `CollectorConfigMixin` for improved separation of concerns in telemetry exporters. - Updated `Langfuse`, `Langsmith`, `OtelCollector`, `Patronus`, `Galileo`, `Phoenix`, and `Catalyst` telemetry exporters to utilize the new mixins, streamlining their configuration. - Removed redundant project name fields from exporters where applicable. This refactor enhances the clarity and maintainability of the telemetry exporter implementations. Signed-off-by: Matthew Penn <mpenn@nvidia.com>

…ity-redesign Signed-off-by: Matthew Penn <mpenn@nvidia.com>

Signed-off-by: Matthew Penn <mpenn@nvidia.com>

mpenn · 2025-07-24T00:23:50Z

/merge

…NVIDIA#379) Closes: NVIDIA#378 This PR introduces a complete overhaul of the NeMo Agent toolkit's observability infrastructure, providing a robust, scalable, and extensible framework for monitoring and tracing workflow execution. ## Key Improvements - **Backwards Compatibility**: Maintains full backward compatibility while adding significant improvements in extensibility - **Modular Architecture**: Plugin-based system supporting multiple telemetry backends - **Reduced Dependencies**: Core observability no longer requires OpenTelemetry - now available as optional plugin - **High Performance**: Copy-on-write isolation enables efficient concurrent execution - **Extensible**: Type-safe processor pipelines for flexible data transformation - **Multiple Backends**: Built-in support for OpenTelemetry, Phoenix, and Weave ## New Architecture ### Class/Plugin Hierarchy ``` Exporter (abstract interface) └── BaseExporter (abstract) └── ProcessingExporter (abstract) ├── SpanExporter (abstract) │ ├── OtelSpanExporter (abstract) │ │ ├── PhoenixOtelExporter (concrete) │ │ │ └── phoenix (plugin) │ │ ├── OTLPSpanAdapterExporter (concrete) │ │ │ ├── langfuse (plugin) │ │ │ ├── langsmith (plugin) │ │ │ ├── otelcollector (plugin) │ │ │ ├── patronus (plugin) │ │ │ └── galileo (plugin) │ │ └── RagaAICatalystExporter (concrete) │ │ └── catalyst (plugin) │ └── WeaveExporter (concrete) │ └── weave (plugin) └── RawExporter (abstract) └── FileExporter (concrete) └── file (plugin) ``` ### Core Components #### ExporterManager - Manages exporter lifecycle and concurrent execution - Copy-on-write isolation for concurrent workflows - Automatic cleanup and resource management #### Processing Pipeline Processing pipelines enable flexible data transformation chains where events can be modified, enriched, or reformatted before export. This allows exporters to adapt data for different backends without duplicating transformation logic. - `Processor[InputT, OutputT]` generic transformation interface - `BatchingProcessor` for efficient dynamic batching of traces - Type-safe processor chains #### Copy-on-Write Isolation ``` ExporterManager ├── Original Exporters (Shared Registry) │ ├── Shared: HTTP Clients, Auth, Configuration │ └── create_isolated_exporters() │ ├── Workflow 1 → Isolated Instance (private tasks, events, buffers) │ ├── Workflow 2 → Isolated Instance (private tasks, events, buffers) │ └── Workflow N → Isolated Instance (private tasks, events, buffers) └── Benefits: Fast (shared resources) + Safe (isolated state) ``` #### Performance Benefits - **Fast**: Shares expensive resources (HTTP clients, auth) - **Safe**: Isolates mutable state (tasks, events, buffers) - **Memory Efficient**: Minimal overhead for isolation - **Concurrent**: Enables parallel workflow execution #### Core Plugin Exporters - **FileExporter**: Local file export for debugging ### Plugin Exporters - **OpenTelemetry**: Standard OTEL span export with OTLP adapter - **Phoenix**: AI observability platform integration - **Weave**: Weights & Biases integration ## Integration The system integrates seamlessly with existing NAT runtime: ```python # In AIQRunner async with self._exporter_manager.start(context_state=self._context_state): result = await self._entry_fn.ainvoke(self._input_message, to_type=to_type) # In WorkflowBuilder for key, exporter_config in telemetry_config.tracing.items(): await self.add_exporter(key, exporter_config) ``` Also included are minor bug fixes: - Fixing `Function.astream` to set final output of `ActiveFunctionContextManager` for streaming requests - Fixing `Function.astream` to use function runtime instance name to remove ambiguity in traces (previously the config type was used) ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. Authors: - Matthew Penn (https://github.com/mpenn) - Michael Demoret (https://github.com/mdemoret-nv) Approvers: - Michael Demoret (https://github.com/mdemoret-nv) URL: NVIDIA#379

…NVIDIA#379) Closes: NVIDIA#378 This PR introduces a complete overhaul of the NeMo Agent toolkit's observability infrastructure, providing a robust, scalable, and extensible framework for monitoring and tracing workflow execution. - **Backwards Compatibility**: Maintains full backward compatibility while adding significant improvements in extensibility - **Modular Architecture**: Plugin-based system supporting multiple telemetry backends - **Reduced Dependencies**: Core observability no longer requires OpenTelemetry - now available as optional plugin - **High Performance**: Copy-on-write isolation enables efficient concurrent execution - **Extensible**: Type-safe processor pipelines for flexible data transformation - **Multiple Backends**: Built-in support for OpenTelemetry, Phoenix, and Weave ``` Exporter (abstract interface) └── BaseExporter (abstract) └── ProcessingExporter (abstract) ├── SpanExporter (abstract) │ ├── OtelSpanExporter (abstract) │ │ ├── PhoenixOtelExporter (concrete) │ │ │ └── phoenix (plugin) │ │ ├── OTLPSpanAdapterExporter (concrete) │ │ │ ├── langfuse (plugin) │ │ │ ├── langsmith (plugin) │ │ │ ├── otelcollector (plugin) │ │ │ ├── patronus (plugin) │ │ │ └── galileo (plugin) │ │ └── RagaAICatalystExporter (concrete) │ │ └── catalyst (plugin) │ └── WeaveExporter (concrete) │ └── weave (plugin) └── RawExporter (abstract) └── FileExporter (concrete) └── file (plugin) ``` - Manages exporter lifecycle and concurrent execution - Copy-on-write isolation for concurrent workflows - Automatic cleanup and resource management Processing pipelines enable flexible data transformation chains where events can be modified, enriched, or reformatted before export. This allows exporters to adapt data for different backends without duplicating transformation logic. - `Processor[InputT, OutputT]` generic transformation interface - `BatchingProcessor` for efficient dynamic batching of traces - Type-safe processor chains ``` ExporterManager ├── Original Exporters (Shared Registry) │ ├── Shared: HTTP Clients, Auth, Configuration │ └── create_isolated_exporters() │ ├── Workflow 1 → Isolated Instance (private tasks, events, buffers) │ ├── Workflow 2 → Isolated Instance (private tasks, events, buffers) │ └── Workflow N → Isolated Instance (private tasks, events, buffers) └── Benefits: Fast (shared resources) + Safe (isolated state) ``` - **Fast**: Shares expensive resources (HTTP clients, auth) - **Safe**: Isolates mutable state (tasks, events, buffers) - **Memory Efficient**: Minimal overhead for isolation - **Concurrent**: Enables parallel workflow execution - **FileExporter**: Local file export for debugging - **OpenTelemetry**: Standard OTEL span export with OTLP adapter - **Phoenix**: AI observability platform integration - **Weave**: Weights & Biases integration The system integrates seamlessly with existing NAT runtime: ```python async with self._exporter_manager.start(context_state=self._context_state): result = await self._entry_fn.ainvoke(self._input_message, to_type=to_type) for key, exporter_config in telemetry_config.tracing.items(): await self.add_exporter(key, exporter_config) ``` Also included are minor bug fixes: - Fixing `Function.astream` to set final output of `ActiveFunctionContextManager` for streaming requests - Fixing `Function.astream` to use function runtime instance name to remove ambiguity in traces (previously the config type was used) - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. Authors: - Matthew Penn (https://github.com/mpenn) - Michael Demoret (https://github.com/mdemoret-nv) Approvers: - Michael Demoret (https://github.com/mdemoret-nv) URL: NVIDIA#379

mpenn added 7 commits June 16, 2025 15:30

Removing commented out class in observability utils

55df074

Signed-off-by: Matthew Penn <mpenn@nvidia.com>

Formatting updates to pass checks. Updated runner to use the exporter…

41202e2

… manager context manager. Signed-off-by: Matthew Penn <mpenn@nvidia.com>

Update Function class to use instance_name in context manager and add…

a9d10da

… type check in StepAdaptor process method Signed-off-by: Matthew Penn <mpenn@nvidia.com>

Moved Patronus exporter to opentelemetry package, removed commented out

945ac8c

exporters and removed deprecated of main pyproject.toml Signed-off-by: Matthew Penn <mpenn@nvidia.com>

Updated registration names for new_weave and new_phoenix to weave and

55a6043

phoenix Signed-off-by: Matthew Penn <mpenn@nvidia.com>

Updates to function to ensure final output is included in intermediate

7402d27

step output Signed-off-by: Matthew Penn <mpenn@nvidia.com>

mpenn self-assigned this Jun 17, 2025

mpenn added improvement Improvement to existing functionality non-breaking Non-breaking change labels Jun 17, 2025

mpenn added 6 commits June 17, 2025 00:20

Added code documentation and performed minor bug fixes

74fb472

Signed-off-by: Matthew Penn <mpenn@nvidia.com>

Merge remote-tracking branch 'upstream/develop' into mpenn_observabil…

cfacc7f

…ity-redesign

Added headers to otel_span and utils

91c19d5

Signed-off-by: Matthew Penn <mpenn@nvidia.com>

Renaming unit test for observability utils

4642a68

Signed-off-by: Matthew Penn <mpenn@nvidia.com>

Fixing documentation formatting issues

f0374a2

Signed-off-by: Matthew Penn <mpenn@nvidia.com>

Fixing some documentation formatting issues

de9ea6f

Signed-off-by: Matthew Penn <mpenn@nvidia.com>

mpenn requested a review from mdemoret-nv June 17, 2025 15:35

mpenn marked this pull request as ready for review June 18, 2025 00:52

mdemoret-nv requested changes Jun 18, 2025

View reviewed changes

mpenn added 11 commits June 20, 2025 10:05

Added docstring back to Function and updated streaming output to be list

97e4261

Signed-off-by: Matthew Penn <mpenn@nvidia.com>

Removing if telemetry_config.tracing, not needed anymore

ccf1fec

Signed-off-by: Matthew Penn <mpenn@nvidia.com>

Merge remote-tracking branch 'upstream/develop' into mpenn_observabil…

8a4d931

…ity-redesign Signed-off-by: Matthew Penn <mpenn@nvidia.com>

Make add_exporter a public method

a3157f4

Signed-off-by: Matthew Penn <mpenn@nvidia.com>

Refactors to address remaining PR comments to decouple interfaces from

2dfd4af

implementations, removal of ExporterRegistry, and moving on_complete within ExporterManager.start context manager. Signed-off-by: Matthew Penn <mpenn@nvidia.com>

Removing unnecessary code block

75f09dc

Signed-off-by: Matthew Penn <mpenn@nvidia.com>

Adding unit tests for observability utils and processors

544bfba

Signed-off-by: Matthew Penn <mpenn@nvidia.com>

Adding headers to unit tests

2e76944

Signed-off-by: Matthew Penn <mpenn@nvidia.com>

Adding checks to makes sure nothing leaks under high concurrency and

8d2ab3b

updated tests Signed-off-by: Matthew Penn <mpenn@nvidia.com>

Updates to improve export efficiency, including batch exports and

4de081c

lighter task scheduling Signed-off-by: Matthew Penn <mpenn@nvidia.com>

Updating unit tests and removing bottlenecks in exporters

92e9d7d

Signed-off-by: Matthew Penn <mpenn@nvidia.com>

mpenn and others added 6 commits July 21, 2025 00:05

Added test_add_telemetry_exporter to test_builder.py unit tests

c9f5267

Signed-off-by: Matthew Penn <mpenn@nvidia.com>

Added documentation for adding new telemetry exporters. Fixed typo in

ce2c975

workflow observability index.md re: aiq info -t, was missing components command Signed-off-by: Matthew Penn <mpenn@nvidia.com>

Merge remote-tracking branch 'upstream/develop' into mpenn_observabil…

129b86f

…ity-redesign Signed-off-by: Michael Demoret <mdemoret@nvidia.com>

Removing extra examples files post merge

da20ce1

Signed-off-by: Michael Demoret <mdemoret@nvidia.com>

Fixing lock file

50eae77

Signed-off-by: Michael Demoret <mdemoret@nvidia.com>

mdemoret-nv approved these changes Jul 23, 2025

View reviewed changes

mpenn added 3 commits July 23, 2025 05:00

Refactor telemetry exporter type to use BaseExporter

f5cbe3e

Updated the TelemetryExporterBuildCallableT type to reference BaseExporter instead of SpanExporter. Removed the OpenTelemetry import block as it is no longer necessary. Signed-off-by: Matthew Penn <mpenn@nvidia.com>

Streamlined the of telemetry exporters add process by replacing async…

d9a339e

…io.gather with a simple for loop. Signed-off-by: Matthew Penn <mpenn@nvidia.com>

Merge branch 'mpenn_observability-redesign' of github.com:mpenn/NeMo-…

d60ec31

…Agent-Toolkit into mpenn_observability-redesign

willkill07 reviewed Jul 23, 2025

View reviewed changes

src/aiq/data_models/span.py Outdated Show resolved Hide resolved

src/aiq/observability/exporter/span_exporter.py Outdated Show resolved Hide resolved

packages/aiqtoolkit_opentelemetry/src/aiq/plugins/opentelemetry/otel_span.py Show resolved Hide resolved

mpenn added 12 commits July 23, 2025 11:55

Removed type check for IntermediateStep as to not hide unintended err…

7f0ebe9

…ors. Signed-off-by: Matthew Penn <mpenn@nvidia.com>

Refactor SpanContext trace_id initialization to remove bitwise operat…

3294bc5

…ion for UUID. Updated SpanExporter to use span_id factory method. Signed-off-by: Matthew Penn <mpenn@nvidia.com>

Improve documentation of pydantic models and public methods in span.py

ce5d5af

Signed-off-by: Matthew Penn <mpenn@nvidia.com>

Improve docstrings to include type hints in resource_conflict_mixin.py

6866165

Signed-off-by: Matthew Penn <mpenn@nvidia.com>

Expand span enums and convesion map to account for additional event

d4e9e2e

types (including planned event types) Signed-off-by: Matthew Penn <mpenn@nvidia.com>

Resolving document check errors

b2a7cd6

Signed-off-by: Matthew Penn <mpenn@nvidia.com>

Resolving mermaid diagram syntax issues

19718df

Signed-off-by: Matthew Penn <mpenn@nvidia.com>

Merge remote-tracking branch 'upstream/develop' into mpenn_observabil…

8855e64

…ity-redesign Signed-off-by: Matthew Penn <mpenn@nvidia.com>

Fixing default endpoint for catalyst exporter

407151e

Signed-off-by: Matthew Penn <mpenn@nvidia.com>

mpenn added breaking Breaking change and removed non-breaking Non-breaking change labels Jul 24, 2025

rapids-bot bot merged commit fa35eeb into NVIDIA:develop Jul 24, 2025
12 checks passed

Observability redesign to reduce dependencies and improve flexibility #379

Observability redesign to reduce dependencies and improve flexibility #379

Uh oh!

Conversation

mpenn commented Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Key Improvements

New Architecture

Class/Plugin Hierarchy

Core Components

ExporterManager

Processing Pipeline

Copy-on-Write Isolation

Performance Benefits

Core Plugin Exporters

Plugin Exporters

Integration

By Submitting this PR I confirm:

Uh oh!

mdemoret-nv left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mdemoret-nv left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

willkill07 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mpenn commented Jul 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mpenn commented Jun 16, 2025 •

edited

Loading