KEMBAR78
Update the information displayed in the Weave Eval dashboard by AnuradhaKaruppiah · Pull Request #390 · NVIDIA/NeMo-Agent-Toolkit · GitHub
Skip to content

Conversation

@AnuradhaKaruppiah
Copy link
Contributor

@AnuradhaKaruppiah AnuradhaKaruppiah commented Jun 24, 2025

Description

Closes
#387, #388

The following changes are included in this PR:

  1. Provide a configurable workflow alias; this allows easy comparison across multiple runs
  2. Use the summary info provided by the aiq evaluators instead of using the weave auto-summarize
  3. Update the profiler runner to return the base metrics as an object (this can be extended to allow using the profiler runner programatically)
  4. Add usage stats to the summary page for easy comp (see attached image)
    weave_eval_summary
  5. Add usage stats per-dataset entry; can be selected as a dimension for plotting (see attached image)
    weave_eval_dataset_results
  6. Add config file for easy eval-comp demos and update docs for eval visualization

By Submitting this PR I confirm:

  • I am familiar with the Contributing Guidelines.
  • We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
    • Any contribution which contains commits that are not Signed-Off will not be accepted.
  • When the PR is ready for review, new or existing tests cover these changes.
  • When the PR is ready for review, the documentation is up to date with these changes.

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
@AnuradhaKaruppiah AnuradhaKaruppiah added improvement Improvement to existing functionality non-breaking Non-breaking change labels Jun 24, 2025
@AnuradhaKaruppiah AnuradhaKaruppiah marked this pull request as draft June 24, 2025 17:15
@AnuradhaKaruppiah AnuradhaKaruppiah changed the title Ak weave eval update Update the information displayed in the Weave Eval dashboard Jun 24, 2025
It was confusing with no functional benefits

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
This needs some re-work

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
@AnuradhaKaruppiah AnuradhaKaruppiah marked this pull request as ready for review June 25, 2025 23:17
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR updates the evaluation dashboard to improve the display of evaluation information. Key changes include configurable workflow alias support, integration of summary info from aiq evaluators (and not the auto-summarize), updated profiler runner to return base metrics as an object, and the addition of usage stats both at the summary and per-dataset level.

Reviewed Changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/aiq/eval/test_evaluate.py Updated tests to pass a ProfilerResults instance to write_output
src/aiq/profiler/profile_runner.py Updated async run signature to return ProfilerResults and enhanced error handling by returning an empty ProfilerResults on exception
src/aiq/profiler/data_models.py Added new ProfilerResults data model
src/aiq/eval/utils/weave_eval.py Refactored logger initialization and summary logging to incorporate profiler and usage stats metrics
src/aiq/eval/usage_stats.py Added new usage stats models
src/aiq/eval/evaluate.py Updated usage stats computation and write_output call signature
src/aiq/data_models/profiler.py Extended ProfilerConfig with base_metrics option
src/aiq/data_models/evaluate.py Added workflow_alias support for evaluation UI
YAML config files (examples/…) and docs/source/workflows/evaluate.md Updated configuration and docs for evaluation visualization
Comments suppressed due to low confidence (1)

src/aiq/eval/evaluate.py:285

  • The change in write_output's signature to require a profiler_results argument is a breaking change. Ensure that all call sites and tests are updated accordingly and document this change so that downstream users are aware.
    def write_output(self, dataset_handler: DatasetHandler, profiler_results: ProfilerResults):

The NIM counts are not correct and can be confusing for a user

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Copy link
Contributor

@dnandakumar-nv dnandakumar-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@AnuradhaKaruppiah
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit bb4ad0c into NVIDIA:develop Jun 27, 2025
12 checks passed
@AnuradhaKaruppiah AnuradhaKaruppiah deleted the ak-weave-eval-update branch August 4, 2025 00:17
AnuradhaKaruppiah added a commit to AnuradhaKaruppiah/oss-agentiq that referenced this pull request Aug 4, 2025
)

Closes
NVIDIA#387, NVIDIA#388

The following changes are included in this PR:
1. Provide a configurable workflow alias; this allows easy comparison across multiple runs
2. Use the summary info provided by the aiq evaluators instead of using the weave auto-summarize
3. Update the profiler runner to return the base metrics as an object (this can be extended to allow using the profiler runner programatically)
4. Add usage stats to the summary page for easy comp (see attached image)
![weave_eval_summary](https://github.com/user-attachments/assets/eed19485-33cd-4d20-a9ae-6bb588b2e51d)
5. Add usage stats per-dataset entry; can be selected as a dimension for plotting (see attached image)
![weave_eval_dataset_results](https://github.com/user-attachments/assets/0d146ad9-d6a9-4e35-9a69-51aeff0fd8a7)
6. Add config file for easy eval-comp demos and update docs for eval visualization


## By Submitting this PR I confirm:
- I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing.md).
- We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
  - Any contribution which contains commits that are not Signed-Off will not be accepted.
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.

Authors:
  - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah)

Approvers:
  - Dhruv Nandakumar (https://github.com/dnandakumar-nv)

URL: NVIDIA#390
scheckerNV pushed a commit to scheckerNV/aiq-factory-reset that referenced this pull request Aug 22, 2025
)

Closes
NVIDIA#387, NVIDIA#388

The following changes are included in this PR:
1. Provide a configurable workflow alias; this allows easy comparison across multiple runs
2. Use the summary info provided by the aiq evaluators instead of using the weave auto-summarize
3. Update the profiler runner to return the base metrics as an object (this can be extended to allow using the profiler runner programatically)
4. Add usage stats to the summary page for easy comp (see attached image)
![weave_eval_summary](https://github.com/user-attachments/assets/eed19485-33cd-4d20-a9ae-6bb588b2e51d)
5. Add usage stats per-dataset entry; can be selected as a dimension for plotting (see attached image)
![weave_eval_dataset_results](https://github.com/user-attachments/assets/0d146ad9-d6a9-4e35-9a69-51aeff0fd8a7)
6. Add config file for easy eval-comp demos and update docs for eval visualization


## By Submitting this PR I confirm:
- I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing.md).
- We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
  - Any contribution which contains commits that are not Signed-Off will not be accepted.
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.

Authors:
  - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah)

Approvers:
  - Dhruv Nandakumar (https://github.com/dnandakumar-nv)

URL: NVIDIA#390
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improvement to existing functionality non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants