KEMBAR78
Simplify custom evaluator definition by AnuradhaKaruppiah · Pull Request #358 · NVIDIA/NeMo-Agent-Toolkit · GitHub
Skip to content

Conversation

@AnuradhaKaruppiah
Copy link
Contributor

@AnuradhaKaruppiah AnuradhaKaruppiah commented Jun 10, 2025

Description

This PR introduces a new BaseEvaluator abstract class to centralize common evaluator logic and updates the documentation and example code to use it.

  • Extract concurrency, progress bar, and average-score computation into BaseEvaluator.
  • Simplify the similarity evaluator example to subclass BaseEvaluator.
  • Refresh docs to explain the new extension flow with the register_evaluator decorator.

By Submitting this PR I confirm:

  • I am familiar with the Contributing Guidelines.
  • We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
    • Any contribution which contains commits that are not Signed-Off will not be accepted.
  • When the PR is ready for review, new or existing tests cover these changes.
  • When the PR is ready for review, the documentation is up to date with these changes.

…lass

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
@AnuradhaKaruppiah AnuradhaKaruppiah self-assigned this Jun 10, 2025
@AnuradhaKaruppiah AnuradhaKaruppiah added improvement Improvement to existing functionality non-breaking Non-breaking change labels Jun 10, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a new BaseEvaluator abstract class to centralize common evaluator logic and updates the documentation and example code to use it.

  • Extract concurrency, progress bar, and average-score computation into BaseEvaluator.
  • Simplify the similarity evaluator example to subclass BaseEvaluator.
  • Refresh docs to explain the new extension flow with the register_evaluator decorator.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/aiq/eval/evaluator/custom_base_evaluator.py New BaseEvaluator ABC with evaluate_item and shared evaluate method for custom evaluators
docs/source/extend/custom-evaluator.md Updated guide to use BaseEvaluator and register_evaluator, streamlined similarity example
Comments suppressed due to low confidence (3)

src/aiq/eval/evaluator/custom_base_evaluator.py:47

  • There are no tests covering the new BaseEvaluator.evaluate method. Add unit tests to verify concurrency limits, progress bar updates, and average-score logic for both numeric and non-numeric scores.
async def evaluate(self, eval_input: EvalInput) -> EvalOutput:

docs/source/extend/custom-evaluator.md:64

  • Fix the typo in the description string: "Simlaity Evaluator" should be "Similarity Evaluator".
yield EvaluatorInfo(config=config, evaluate_fn=evaluator.evaluate, description="Simlaity Evaluator")

docs/source/extend/custom-evaluator.md:94

  • The docs describe EvalInputItem and EvalOutputItem but omit the EvalOutput type returned by evaluate. Consider adding a section explaining its fields (e.g., average_score, eval_output_items).
**EvalOutputItem**

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Copy link
Collaborator

@mdemoret-nv mdemoret-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The abstraction looks good, but why arent we using this abstraction in our own evaluators?

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
@AnuradhaKaruppiah AnuradhaKaruppiah dismissed mdemoret-nv’s stale review June 12, 2025 22:10

Comments addressed, I am merging this after secondary review from Yuchen

@AnuradhaKaruppiah
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit c245b81 into NVIDIA:develop Jun 12, 2025
12 checks passed
@AnuradhaKaruppiah AnuradhaKaruppiah deleted the ak-custom-eval-simplify branch June 25, 2025 15:33
AnuradhaKaruppiah added a commit to AnuradhaKaruppiah/oss-agentiq that referenced this pull request Aug 4, 2025
This PR introduces a new BaseEvaluator abstract class to centralize common evaluator logic and updates the documentation and example code to use it.

- Extract concurrency, progress bar, and average-score computation into BaseEvaluator.
- Simplify the similarity evaluator example to subclass BaseEvaluator.
- Refresh docs to explain the new extension flow with the register_evaluator decorator.

## By Submitting this PR I confirm:
- I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/AIQToolkit/blob/develop/docs/source/resources/contributing.md).
- We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
  - Any contribution which contains commits that are not Signed-Off will not be accepted.
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.

Authors:
  - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah)

Approvers:
  - Yuchen Zhang (https://github.com/yczhang-nv)

URL: NVIDIA#358
scheckerNV pushed a commit to scheckerNV/aiq-factory-reset that referenced this pull request Aug 22, 2025
This PR introduces a new BaseEvaluator abstract class to centralize common evaluator logic and updates the documentation and example code to use it.

- Extract concurrency, progress bar, and average-score computation into BaseEvaluator.
- Simplify the similarity evaluator example to subclass BaseEvaluator.
- Refresh docs to explain the new extension flow with the register_evaluator decorator.

## By Submitting this PR I confirm:
- I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/AIQToolkit/blob/develop/docs/source/resources/contributing.md).
- We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
  - Any contribution which contains commits that are not Signed-Off will not be accepted.
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.

Authors:
  - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah)

Approvers:
  - Yuchen Zhang (https://github.com/yczhang-nv)

URL: NVIDIA#358
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improvement to existing functionality non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants