KEMBAR78
Add support for Weave evaluation by ayulockin · Pull Request #264 · NVIDIA/NeMo-Agent-Toolkit · GitHub
Skip to content

Conversation

@ayulockin
Copy link
Contributor

Description

This PR adds the ability to run evaluations such that the traces and evaluation scores are logged to W&B Weave. This allows for comparing evaluations, debugging evaluations, etc.

By Submitting this PR I confirm:

  • I am familiar with the Contributing Guidelines.
  • We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
    • Any contribution which contains commits that are not Signed-Off will not be accepted.
  • When the PR is ready for review, new or existing tests cover these changes.
  • When the PR is ready for review, the documentation is up to date with these changes.

Signed-off-by: ayulockin <mein2work@gmail.com>
@copy-pr-bot
Copy link

copy-pr-bot bot commented May 14, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@mdemoret-nv mdemoret-nv added feature request New feature or request non-breaking Non-breaking change labels May 29, 2025
Copy link
Contributor

@AnuradhaKaruppiah AnuradhaKaruppiah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this super cool contribution @ayulockin

ayulockin added 5 commits May 30, 2025 14:01
Signed-off-by: ayulockin <mein2work@gmail.com>
Signed-off-by: ayulockin <mein2work@gmail.com>
Signed-off-by: ayulockin <mein2work@gmail.com>
Signed-off-by: ayulockin <mein2work@gmail.com>
@ayulockin
Copy link
Contributor Author

Also can you pls clarify what the perf overhead is for having this trace enabled (beyond the weave cost)?

This is without weave turned on:

aiq eval --config_file=examples/simple/configs/eval_config_weave.yml 6.24s user 5.29s system 41% cpu 27.776 total

This is with weave enabled:

aiq eval --config_file=examples/simple/configs/eval_config_weave.yml 9.25s user 5.90s system 45% cpu 33.171 total

Note that this comparison is done only on 5 examples but definitely there is some performance overhead introduced when weave is turned on.

@ayulockin ayulockin marked this pull request as ready for review May 30, 2025 09:49
@ayulockin
Copy link
Contributor Author

Hey @AnuradhaKaruppiah the race conditions are gone. I have removed the hack to suppress the bad stdouts. The PR is ready from my end to be reviewed.

I want to add support for profiler and add tests. I have started working on them but we can add those in a separate PR or if you want I can add them with this PR.

@ayulockin
Copy link
Contributor Author

Hey @AnuradhaKaruppiah I also ran the following checks locally.

ci/scripts/checks.sh

image

python ci/scripts/copyright.py --verify-apache-v2

INFO:Copyright check passed

ci/scripts/documentation_checks.sh

✔ 0 errors, 0 warnings and 0 suggestions in 101 files.

Signed-off-by: ayulockin <mein2work@gmail.com>
@AnuradhaKaruppiah
Copy link
Contributor

/ok to test ef9f17b

@AnuradhaKaruppiah
Copy link
Contributor

/merge

@rapids-bot rapids-bot bot merged commit 74503fa into NVIDIA:develop Jun 23, 2025
12 checks passed
AnuradhaKaruppiah pushed a commit to AnuradhaKaruppiah/oss-agentiq that referenced this pull request Aug 4, 2025
This PR adds the ability to run evaluations such that the traces and evaluation scores are logged to W&B Weave. This allows for comparing evaluations, debugging evaluations, etc.


## By Submitting this PR I confirm:
- I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/AIQToolkit/blob/develop/docs/source/resources/contributing.md).
- We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
  - Any contribution which contains commits that are not Signed-Off will not be accepted.
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.

Authors:
  - Ayush Thakur (https://github.com/ayulockin)

Approvers:
  - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah)

URL: NVIDIA#264
scheckerNV pushed a commit to scheckerNV/aiq-factory-reset that referenced this pull request Aug 22, 2025
This PR adds the ability to run evaluations such that the traces and evaluation scores are logged to W&B Weave. This allows for comparing evaluations, debugging evaluations, etc.


## By Submitting this PR I confirm:
- I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/AIQToolkit/blob/develop/docs/source/resources/contributing.md).
- We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
  - Any contribution which contains commits that are not Signed-Off will not be accepted.
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.

Authors:
  - Ayush Thakur (https://github.com/ayulockin)

Approvers:
  - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah)

URL: NVIDIA#264
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature request New feature or request non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants