-
Notifications
You must be signed in to change notification settings - Fork 396
Update the information displayed in the Weave Eval dashboard #390
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update the information displayed in the Weave Eval dashboard #390
Conversation
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
It was confusing with no functional benefits Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
46b4ad2 to
879744f
Compare
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
This needs some re-work Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR updates the evaluation dashboard to improve the display of evaluation information. Key changes include configurable workflow alias support, integration of summary info from aiq evaluators (and not the auto-summarize), updated profiler runner to return base metrics as an object, and the addition of usage stats both at the summary and per-dataset level.
Reviewed Changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| tests/aiq/eval/test_evaluate.py | Updated tests to pass a ProfilerResults instance to write_output |
| src/aiq/profiler/profile_runner.py | Updated async run signature to return ProfilerResults and enhanced error handling by returning an empty ProfilerResults on exception |
| src/aiq/profiler/data_models.py | Added new ProfilerResults data model |
| src/aiq/eval/utils/weave_eval.py | Refactored logger initialization and summary logging to incorporate profiler and usage stats metrics |
| src/aiq/eval/usage_stats.py | Added new usage stats models |
| src/aiq/eval/evaluate.py | Updated usage stats computation and write_output call signature |
| src/aiq/data_models/profiler.py | Extended ProfilerConfig with base_metrics option |
| src/aiq/data_models/evaluate.py | Added workflow_alias support for evaluation UI |
| YAML config files (examples/…) and docs/source/workflows/evaluate.md | Updated configuration and docs for evaluation visualization |
Comments suppressed due to low confidence (1)
src/aiq/eval/evaluate.py:285
- The change in write_output's signature to require a profiler_results argument is a breaking change. Ensure that all call sites and tests are updated accordingly and document this change so that downstream users are aware.
def write_output(self, dataset_handler: DatasetHandler, profiler_results: ProfilerResults):
The NIM counts are not correct and can be confusing for a user Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
/merge |
) Closes NVIDIA#387, NVIDIA#388 The following changes are included in this PR: 1. Provide a configurable workflow alias; this allows easy comparison across multiple runs 2. Use the summary info provided by the aiq evaluators instead of using the weave auto-summarize 3. Update the profiler runner to return the base metrics as an object (this can be extended to allow using the profiler runner programatically) 4. Add usage stats to the summary page for easy comp (see attached image)  5. Add usage stats per-dataset entry; can be selected as a dimension for plotting (see attached image)  6. Add config file for easy eval-comp demos and update docs for eval visualization ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. Authors: - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) Approvers: - Dhruv Nandakumar (https://github.com/dnandakumar-nv) URL: NVIDIA#390
) Closes NVIDIA#387, NVIDIA#388 The following changes are included in this PR: 1. Provide a configurable workflow alias; this allows easy comparison across multiple runs 2. Use the summary info provided by the aiq evaluators instead of using the weave auto-summarize 3. Update the profiler runner to return the base metrics as an object (this can be extended to allow using the profiler runner programatically) 4. Add usage stats to the summary page for easy comp (see attached image)  5. Add usage stats per-dataset entry; can be selected as a dimension for plotting (see attached image)  6. Add config file for easy eval-comp demos and update docs for eval visualization ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. Authors: - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) Approvers: - Dhruv Nandakumar (https://github.com/dnandakumar-nv) URL: NVIDIA#390
Description
Closes
#387, #388
The following changes are included in this PR:
By Submitting this PR I confirm: