Update the information displayed in the Weave Eval dashboard #390

AnuradhaKaruppiah · 2025-06-24T17:15:04Z

Description

Closes
#387, #388

The following changes are included in this PR:

Provide a configurable workflow alias; this allows easy comparison across multiple runs
Use the summary info provided by the aiq evaluators instead of using the weave auto-summarize
Update the profiler runner to return the base metrics as an object (this can be extended to allow using the profiler runner programatically)
Add usage stats to the summary page for easy comp (see attached image)
Add usage stats per-dataset entry; can be selected as a dimension for plotting (see attached image)
Add config file for easy eval-comp demos and update docs for eval visualization

By Submitting this PR I confirm:

I am familiar with the Contributing Guidelines.
We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
- Any contribution which contains commits that are not Signed-Off will not be accepted.
When the PR is ready for review, new or existing tests cover these changes.
When the PR is ready for review, the documentation is up to date with these changes.

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>

It was confusing with no functional benefits Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>

This needs some re-work Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>

Copilot

Pull Request Overview

This PR updates the evaluation dashboard to improve the display of evaluation information. Key changes include configurable workflow alias support, integration of summary info from aiq evaluators (and not the auto-summarize), updated profiler runner to return base metrics as an object, and the addition of usage stats both at the summary and per-dataset level.

Reviewed Changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
tests/aiq/eval/test_evaluate.py	Updated tests to pass a ProfilerResults instance to write_output
src/aiq/profiler/profile_runner.py	Updated async run signature to return ProfilerResults and enhanced error handling by returning an empty ProfilerResults on exception
src/aiq/profiler/data_models.py	Added new ProfilerResults data model
src/aiq/eval/utils/weave_eval.py	Refactored logger initialization and summary logging to incorporate profiler and usage stats metrics
src/aiq/eval/usage_stats.py	Added new usage stats models
src/aiq/eval/evaluate.py	Updated usage stats computation and write_output call signature
src/aiq/data_models/profiler.py	Extended ProfilerConfig with base_metrics option
src/aiq/data_models/evaluate.py	Added workflow_alias support for evaluation UI
YAML config files (examples/…) and docs/source/workflows/evaluate.md	Updated configuration and docs for evaluation visualization

Comments suppressed due to low confidence (1)

src/aiq/eval/evaluate.py:285

The change in write_output's signature to require a profiler_results argument is a breaking change. Ensure that all call sites and tests are updated accordingly and document this change so that downstream users are aware.

    def write_output(self, dataset_handler: DatasetHandler, profiler_results: ProfilerResults):

src/aiq/profiler/profile_runner.py

The NIM counts are not correct and can be confusing for a user Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>

docs/source/_static/weave_eval_dataset_results.png

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>

dnandakumar-nv

LGTM

AnuradhaKaruppiah · 2025-06-27T00:06:18Z

/merge

) Closes NVIDIA#387, NVIDIA#388 The following changes are included in this PR: 1. Provide a configurable workflow alias; this allows easy comparison across multiple runs 2. Use the summary info provided by the aiq evaluators instead of using the weave auto-summarize 3. Update the profiler runner to return the base metrics as an object (this can be extended to allow using the profiler runner programatically) 4. Add usage stats to the summary page for easy comp (see attached image) ![weave_eval_summary](https://github.com/user-attachments/assets/eed19485-33cd-4d20-a9ae-6bb588b2e51d) 5. Add usage stats per-dataset entry; can be selected as a dimension for plotting (see attached image) ![weave_eval_dataset_results](https://github.com/user-attachments/assets/0d146ad9-d6a9-4e35-9a69-51aeff0fd8a7) 6. Add config file for easy eval-comp demos and update docs for eval visualization ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. Authors: - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) Approvers: - Dhruv Nandakumar (https://github.com/dnandakumar-nv) URL: NVIDIA#390

AnuradhaKaruppiah added 2 commits June 23, 2025 16:58

Drop redundant average scores

c7a94a3

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>

Update information displayed in the weave eval dashboard

ba6cd04

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>

AnuradhaKaruppiah added improvement Improvement to existing functionality non-breaking Non-breaking change labels Jun 24, 2025

AnuradhaKaruppiah marked this pull request as draft June 24, 2025 17:15

AnuradhaKaruppiah changed the title ~~Ak weave eval update~~ Update the information displayed in the Weave Eval dashboard Jun 24, 2025

AnuradhaKaruppiah added 3 commits June 24, 2025 11:26

Remove the code for deleting local variables

54043a0

It was confusing with no functional benefits Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>

Add example configuration files for ata

2eb77f6

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>

Add profiler metrics to the weave eval dash board

0b9613a

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>

AnuradhaKaruppiah mentioned this pull request Jun 25, 2025

[BUG]: Make weave eval "model_name" configurable #387

Closed

2 tasks

AnuradhaKaruppiah added 5 commits June 24, 2025 18:45

Revert "Add profiler metrics to the weave eval dash board"

b914dda

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>

Add profiler metrics to the dashboard

85f2232

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>

Fix lazy init of the client

1d484bb

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>

Misc fixes to profiler metrics

c8f9430

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>

Add LLM tokens per-llm

879744f

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>

AnuradhaKaruppiah force-pushed the ak-weave-eval-update branch from 46b4ad2 to 879744f Compare June 25, 2025 01:46

AnuradhaKaruppiah added 7 commits June 24, 2025 19:12

Fixup unit tests

8c241d9

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>

use llama3.3 as the judge llm for all experiments

1362556

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>

More fixups

ed2aec1

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>

Temporary changes for profiler stats

2575197

This needs some re-work Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>

Update configs

0efd764

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>

Miscellaneous cleanup

7019d15

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>

Update docs for visualizing evaluation results

a1d700b

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>

AnuradhaKaruppiah marked this pull request as ready for review June 25, 2025 23:17

AnuradhaKaruppiah added 2 commits June 25, 2025 16:39

Fix unit test failures

c7cbba3

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>

Only export total tokens to the weave dash for now

13bad0c

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>

AnuradhaKaruppiah mentioned this pull request Jun 26, 2025

Drop redundant average scores in the weave dashboard #389

Closed

AnuradhaKaruppiah requested a review from Copilot June 26, 2025 01:13

Copilot AI reviewed Jun 26, 2025

View reviewed changes

src/aiq/profiler/profile_runner.py Show resolved Hide resolved

Temorarily disable token display in the summary page

2ca9029

The NIM counts are not correct and can be confusing for a user Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>

AnuradhaKaruppiah added 3 commits June 26, 2025 06:32

Add alias to the openai weave config file

5762a04

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>

Add a note for plotting individual entries

3dbb596

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>

Fix CI failures

4292989

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>

hsin-c reviewed Jun 26, 2025

View reviewed changes

docs/source/_static/weave_eval_dataset_results.png Show resolved Hide resolved

AnuradhaKaruppiah added 2 commits June 26, 2025 10:48

Add a note on project name and workflow alias

cca6da5

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>

Doc update

a009933

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>

dnandakumar-nv approved these changes Jun 27, 2025

View reviewed changes

rapids-bot bot merged commit bb4ad0c into NVIDIA:develop Jun 27, 2025
12 checks passed

AnuradhaKaruppiah mentioned this pull request Jun 27, 2025

[FEA]: Latency in the weave eval dashboard #388

Closed

2 tasks

AnuradhaKaruppiah deleted the ak-weave-eval-update branch August 4, 2025 00:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update the information displayed in the Weave Eval dashboard #390

Update the information displayed in the Weave Eval dashboard #390

Uh oh!

AnuradhaKaruppiah commented Jun 24, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

dnandakumar-nv left a comment

Uh oh!

AnuradhaKaruppiah commented Jun 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Update the information displayed in the Weave Eval dashboard #390

Update the information displayed in the Weave Eval dashboard #390

Uh oh!

Conversation

AnuradhaKaruppiah commented Jun 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

By Submitting this PR I confirm:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

dnandakumar-nv left a comment

Choose a reason for hiding this comment

Uh oh!

AnuradhaKaruppiah commented Jun 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

AnuradhaKaruppiah commented Jun 24, 2025 •

edited

Loading