KEMBAR78
Consistent Trace Nesting in Parallel Function Calling by dnandakumar-nv · Pull Request #162 · NVIDIA/NeMo-Agent-Toolkit · GitHub
Skip to content

Conversation

@dnandakumar-nv
Copy link
Contributor

Updated span stack to use a dictionary for improved traceability and optimized span handling logic. Adjusted step ID restoration to handle context switching correctly. These changes improve robustness and maintainability in asynchronous environments.

Description

Refactors intermediate step manager, otel adaptor, and langchain callback handler to work in async scenarios.

Crucially, the biggest changes came in otel adaptor to make traces nest appropriately in asyn function call scenarios (many calls in parallel).

The fixes to the langchain callback handler allows for the callback intermediate step to nest the function call underneath it. The intermediate step manager needed to be refactored to prevent cross-context resets of the context variable. Ostensibly,

Functionality changed from old trace like so:
old trace

to a new trace like this:
new trace

By Submitting this PR I confirm:

  • I am familiar with the Contributing Guidelines.
  • We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
    • Any contribution which contains commits that are not Signed-Off will not be accepted.
  • When the PR is ready for review, new or existing tests cover these changes.
  • When the PR is ready for review, the documentation is up to date with these changes.

Updated span stack to use a dictionary for improved traceability and optimized span handling logic. Adjusted step ID restoration to handle context switching correctly. These changes improve robustness and maintainability in asynchronous environments.

Signed-off-by: dnandakumar-nv <168006707+dnandakumar-nv@users.noreply.github.com>
@dnandakumar-nv dnandakumar-nv self-assigned this Apr 29, 2025
@dnandakumar-nv dnandakumar-nv requested a review from a team as a code owner April 29, 2025 19:14
@copy-pr-bot
Copy link

copy-pr-bot bot commented Apr 29, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@dnandakumar-nv dnandakumar-nv added bug Something isn't working and removed enhancement labels Apr 29, 2025
Introduces unit tests for IntermediateStepManager, covering events, context preservation, cross-thread behavior, and error handling. These tests ensure the robustness of step tracking and state transitions. Added minimal stubs to enable isolated testing without dependencies on the full codebase.

Signed-off-by: dnandakumar-nv <168006707+dnandakumar-nv@users.noreply.github.com>
Introduces unit tests for IntermediateStepManager, covering events, context preservation, cross-thread behavior, and error handling. These tests ensure the robustness of step tracking and state transitions. Added minimal stubs to enable isolated testing without dependencies on the full codebase.

Signed-off-by: dnandakumar-nv <168006707+dnandakumar-nv@users.noreply.github.com>
@mdemoret-nv
Copy link
Collaborator

/ok to test ee4f3f4

This commit adds the Apache 2.0 license header to ensure compliance with licensing requirements. The header includes copyright information for NVIDIA Corporation and references to the license terms.

Signed-off-by: dnandakumar-nv <168006707+dnandakumar-nv@users.noreply.github.com>
@dnandakumar-nv
Copy link
Contributor Author

/ok to test 4d96eb1

The removed assertion was unnecessary as its condition is implied by other checks in the test. Simplifying the test improves readability and maintains functionality without altering test coverage.

Signed-off-by: dnandakumar-nv <168006707+dnandakumar-nv@users.noreply.github.com>
@dnandakumar-nv
Copy link
Contributor Author

/ok to test 97533e5

@dnandakumar-nv
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 212c0aa into NVIDIA:develop Apr 30, 2025
10 checks passed
yczhang-nv pushed a commit to yczhang-nv/NeMo-Agent-Toolkit that referenced this pull request May 8, 2025
Refactors intermediate step manager, otel adaptor, and langchain callback handler to work in async scenarios.

Crucially, the biggest changes came in otel adaptor to make traces nest appropriately in asyn function call scenarios (many calls in parallel).

The fixes to the langchain callback handler allows for the callback intermediate step to nest the function call underneath it. The intermediate step manager needed to be refactored to prevent cross-context resets of the context variable. Ostensibly,

Functionality changed from old trace like so:
<img width="475" alt="old trace" src="https://github.com/user-attachments/assets/1e577ee0-dbfc-4dae-80fb-8c03a57e75fb" />

to a new trace like this:
<img width="475" alt="new trace" src="https://github.com/user-attachments/assets/aeced7ae-9876-420b-8632-dd78a5fd4a1f" />

## By Submitting this PR I confirm:
- I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/AgentIQ/blob/develop/docs/source/advanced/contributing.md).
- We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
  - Any contribution which contains commits that are not Signed-Off will not be accepted.
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.

Authors:
  - Dhruv Nandakumar (https://github.com/dnandakumar-nv)

Approvers:
  - Michael Demoret (https://github.com/mdemoret-nv)

URL: NVIDIA#162
Signed-off-by: Yuchen Zhang <134643420+yczhang-nv@users.noreply.github.com>
ericevans-nv pushed a commit to ericevans-nv/agent-iq that referenced this pull request Jun 3, 2025
Refactors intermediate step manager, otel adaptor, and langchain callback handler to work in async scenarios.

Crucially, the biggest changes came in otel adaptor to make traces nest appropriately in asyn function call scenarios (many calls in parallel).

The fixes to the langchain callback handler allows for the callback intermediate step to nest the function call underneath it. The intermediate step manager needed to be refactored to prevent cross-context resets of the context variable. Ostensibly,

Functionality changed from old trace like so:
<img width="475" alt="old trace" src="https://github.com/user-attachments/assets/1e577ee0-dbfc-4dae-80fb-8c03a57e75fb" />

to a new trace like this:
<img width="475" alt="new trace" src="https://github.com/user-attachments/assets/aeced7ae-9876-420b-8632-dd78a5fd4a1f" />

## By Submitting this PR I confirm:
- I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/AgentIQ/blob/develop/docs/source/advanced/contributing.md).
- We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
  - Any contribution which contains commits that are not Signed-Off will not be accepted.
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.

Authors:
  - Dhruv Nandakumar (https://github.com/dnandakumar-nv)

Approvers:
  - Michael Demoret (https://github.com/mdemoret-nv)

URL: NVIDIA#162
Signed-off-by: Eric Evans <194135482+ericevans-nv@users.noreply.github.com>
ericevans-nv pushed a commit to ericevans-nv/agent-iq that referenced this pull request Jun 3, 2025
Refactors intermediate step manager, otel adaptor, and langchain callback handler to work in async scenarios.

Crucially, the biggest changes came in otel adaptor to make traces nest appropriately in asyn function call scenarios (many calls in parallel).

The fixes to the langchain callback handler allows for the callback intermediate step to nest the function call underneath it. The intermediate step manager needed to be refactored to prevent cross-context resets of the context variable. Ostensibly,

Functionality changed from old trace like so:
<img width="475" alt="old trace" src="https://github.com/user-attachments/assets/1e577ee0-dbfc-4dae-80fb-8c03a57e75fb" />

to a new trace like this:
<img width="475" alt="new trace" src="https://github.com/user-attachments/assets/aeced7ae-9876-420b-8632-dd78a5fd4a1f" />

## By Submitting this PR I confirm:
- I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/AgentIQ/blob/develop/docs/source/advanced/contributing.md).
- We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
  - Any contribution which contains commits that are not Signed-Off will not be accepted.
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.

Authors:
  - Dhruv Nandakumar (https://github.com/dnandakumar-nv)

Approvers:
  - Michael Demoret (https://github.com/mdemoret-nv)

URL: NVIDIA#162
Signed-off-by: Eric Evans <194135482+ericevans-nv@users.noreply.github.com>
AnuradhaKaruppiah pushed a commit to AnuradhaKaruppiah/oss-agentiq that referenced this pull request Aug 4, 2025
Refactors intermediate step manager, otel adaptor, and langchain callback handler to work in async scenarios. 

Crucially, the biggest changes came in otel adaptor to make traces nest appropriately in asyn function call scenarios (many calls in parallel). 

The fixes to the langchain callback handler allows for the callback intermediate step to nest the function call underneath it. The intermediate step manager needed to be refactored to prevent cross-context resets of the context variable. Ostensibly, 

Functionality changed from old trace like so:
<img width="475" alt="old trace" src="https://github.com/user-attachments/assets/1e577ee0-dbfc-4dae-80fb-8c03a57e75fb" />

to a new trace like this: 
<img width="475" alt="new trace" src="https://github.com/user-attachments/assets/aeced7ae-9876-420b-8632-dd78a5fd4a1f" />




## By Submitting this PR I confirm:
- I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/AgentIQ/blob/develop/docs/source/advanced/contributing.md).
- We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
  - Any contribution which contains commits that are not Signed-Off will not be accepted.
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.

Authors:
  - Dhruv Nandakumar (https://github.com/dnandakumar-nv)

Approvers:
  - Michael Demoret (https://github.com/mdemoret-nv)

URL: NVIDIA#162
scheckerNV pushed a commit to scheckerNV/aiq-factory-reset that referenced this pull request Aug 22, 2025
Refactors intermediate step manager, otel adaptor, and langchain callback handler to work in async scenarios. 

Crucially, the biggest changes came in otel adaptor to make traces nest appropriately in asyn function call scenarios (many calls in parallel). 

The fixes to the langchain callback handler allows for the callback intermediate step to nest the function call underneath it. The intermediate step manager needed to be refactored to prevent cross-context resets of the context variable. Ostensibly, 

Functionality changed from old trace like so:
<img width="475" alt="old trace" src="https://github.com/user-attachments/assets/1e577ee0-dbfc-4dae-80fb-8c03a57e75fb" />

to a new trace like this: 
<img width="475" alt="new trace" src="https://github.com/user-attachments/assets/aeced7ae-9876-420b-8632-dd78a5fd4a1f" />




## By Submitting this PR I confirm:
- I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/AgentIQ/blob/develop/docs/source/advanced/contributing.md).
- We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
  - Any contribution which contains commits that are not Signed-Off will not be accepted.
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.

Authors:
  - Dhruv Nandakumar (https://github.com/dnandakumar-nv)

Approvers:
  - Michael Demoret (https://github.com/mdemoret-nv)

URL: NVIDIA#162
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants