Simplify custom evaluator definition #358

AnuradhaKaruppiah · 2025-06-10T15:33:41Z

Description

This PR introduces a new BaseEvaluator abstract class to centralize common evaluator logic and updates the documentation and example code to use it.

Extract concurrency, progress bar, and average-score computation into BaseEvaluator.
Simplify the similarity evaluator example to subclass BaseEvaluator.
Refresh docs to explain the new extension flow with the register_evaluator decorator.

By Submitting this PR I confirm:

I am familiar with the Contributing Guidelines.
We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
- Any contribution which contains commits that are not Signed-Off will not be accepted.
When the PR is ready for review, new or existing tests cover these changes.
When the PR is ready for review, the documentation is up to date with these changes.

…lass Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>

Copilot

Pull Request Overview

This PR introduces a new BaseEvaluator abstract class to centralize common evaluator logic and updates the documentation and example code to use it.

Extract concurrency, progress bar, and average-score computation into BaseEvaluator.
Simplify the similarity evaluator example to subclass BaseEvaluator.
Refresh docs to explain the new extension flow with the register_evaluator decorator.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
src/aiq/eval/evaluator/custom_base_evaluator.py	New `BaseEvaluator` ABC with `evaluate_item` and shared `evaluate` method for custom evaluators
docs/source/extend/custom-evaluator.md	Updated guide to use `BaseEvaluator` and `register_evaluator`, streamlined similarity example

Comments suppressed due to low confidence (3)

src/aiq/eval/evaluator/custom_base_evaluator.py:47

There are no tests covering the new BaseEvaluator.evaluate method. Add unit tests to verify concurrency limits, progress bar updates, and average-score logic for both numeric and non-numeric scores.

async def evaluate(self, eval_input: EvalInput) -> EvalOutput:

docs/source/extend/custom-evaluator.md:64

Fix the typo in the description string: "Simlaity Evaluator" should be "Similarity Evaluator".

yield EvaluatorInfo(config=config, evaluate_fn=evaluator.evaluate, description="Simlaity Evaluator")

docs/source/extend/custom-evaluator.md:94

The docs describe EvalInputItem and EvalOutputItem but omit the EvalOutput type returned by evaluate. Consider adding a section explaining its fields (e.g., average_score, eval_output_items).

**EvalOutputItem**

src/aiq/eval/evaluator/custom_base_evaluator.py

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>

mdemoret-nv

The abstraction looks good, but why arent we using this abstraction in our own evaluators?

src/aiq/eval/evaluator/custom_base_evaluator.py

…implify

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>

…implify

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>

Comments addressed, I am merging this after secondary review from Yuchen

AnuradhaKaruppiah · 2025-06-12T22:11:04Z

/merge

This PR introduces a new BaseEvaluator abstract class to centralize common evaluator logic and updates the documentation and example code to use it. - Extract concurrency, progress bar, and average-score computation into BaseEvaluator. - Simplify the similarity evaluator example to subclass BaseEvaluator. - Refresh docs to explain the new extension flow with the register_evaluator decorator. ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/AIQToolkit/blob/develop/docs/source/resources/contributing.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. Authors: - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) Approvers: - Yuchen Zhang (https://github.com/yczhang-nv) URL: NVIDIA#358

AnuradhaKaruppiah added 2 commits June 10, 2025 07:50

Pull the boiler plate code needed for custom evaluators into a base c…

7480f70

…lass Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>

More docs updates

f0dfb3b

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>

AnuradhaKaruppiah self-assigned this Jun 10, 2025

AnuradhaKaruppiah added improvement Improvement to existing functionality non-breaking Non-breaking change labels Jun 10, 2025

AnuradhaKaruppiah requested a review from Copilot June 10, 2025 15:34

Copilot AI reviewed Jun 10, 2025

View reviewed changes

src/aiq/eval/evaluator/custom_base_evaluator.py Show resolved Hide resolved

AnuradhaKaruppiah added 4 commits June 10, 2025 08:38

Initialize pbar

114f551

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>

Fix CI failures

05b2c1f

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>

Add unit tests for the custom evaluator

b676ed7

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>

Minor enhancement to tqdm_desc

d4fe0c8

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>

mdemoret-nv previously requested changes Jun 11, 2025

View reviewed changes

src/aiq/eval/evaluator/custom_base_evaluator.py Show resolved Hide resolved

src/aiq/eval/evaluator/custom_base_evaluator.py Show resolved Hide resolved

AnuradhaKaruppiah added 4 commits June 11, 2025 13:23

Merge remote-tracking branch 'upstream/develop' into ak-custom-eval-s…

59b441b

…implify

Rename the custom_base_evaluator to base_evaluator

18d2e52

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>

Update tunable and trajectory evaluators to use the base evaluator class

f5dab99

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>

Fix CI errors

afb49c1

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>

yczhang-nv approved these changes Jun 12, 2025

View reviewed changes

AnuradhaKaruppiah added 2 commits June 12, 2025 14:08

Merge remote-tracking branch 'upstream/develop' into ak-custom-eval-s…

e7a2d3e

…implify

Fix unit tests failures

3a85cea

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>

rapids-bot bot merged commit c245b81 into NVIDIA:develop Jun 12, 2025
12 checks passed

AnuradhaKaruppiah deleted the ak-custom-eval-simplify branch June 25, 2025 15:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Simplify custom evaluator definition #358

Simplify custom evaluator definition #358

AnuradhaKaruppiah commented Jun 10, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

mdemoret-nv left a comment

Uh oh!

Uh oh!

Uh oh!

AnuradhaKaruppiah commented Jun 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Simplify custom evaluator definition #358

Simplify custom evaluator definition #358

Conversation

AnuradhaKaruppiah commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

By Submitting this PR I confirm:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

mdemoret-nv left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

AnuradhaKaruppiah commented Jun 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

AnuradhaKaruppiah commented Jun 10, 2025 •

edited

Loading