-
Notifications
You must be signed in to change notification settings - Fork 396
Simplify custom evaluator definition #358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify custom evaluator definition #358
Conversation
…lass Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a new BaseEvaluator abstract class to centralize common evaluator logic and updates the documentation and example code to use it.
- Extract concurrency, progress bar, and average-score computation into
BaseEvaluator. - Simplify the similarity evaluator example to subclass
BaseEvaluator. - Refresh docs to explain the new extension flow with the
register_evaluatordecorator.
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| src/aiq/eval/evaluator/custom_base_evaluator.py | New BaseEvaluator ABC with evaluate_item and shared evaluate method for custom evaluators |
| docs/source/extend/custom-evaluator.md | Updated guide to use BaseEvaluator and register_evaluator, streamlined similarity example |
Comments suppressed due to low confidence (3)
src/aiq/eval/evaluator/custom_base_evaluator.py:47
- There are no tests covering the new
BaseEvaluator.evaluatemethod. Add unit tests to verify concurrency limits, progress bar updates, and average-score logic for both numeric and non-numeric scores.
async def evaluate(self, eval_input: EvalInput) -> EvalOutput:
docs/source/extend/custom-evaluator.md:64
- Fix the typo in the description string: "Simlaity Evaluator" should be "Similarity Evaluator".
yield EvaluatorInfo(config=config, evaluate_fn=evaluator.evaluate, description="Simlaity Evaluator")
docs/source/extend/custom-evaluator.md:94
- The docs describe
EvalInputItemandEvalOutputItembut omit theEvalOutputtype returned byevaluate. Consider adding a section explaining its fields (e.g.,average_score,eval_output_items).
**EvalOutputItem**
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The abstraction looks good, but why arent we using this abstraction in our own evaluators?
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Comments addressed, I am merging this after secondary review from Yuchen
|
/merge |
This PR introduces a new BaseEvaluator abstract class to centralize common evaluator logic and updates the documentation and example code to use it. - Extract concurrency, progress bar, and average-score computation into BaseEvaluator. - Simplify the similarity evaluator example to subclass BaseEvaluator. - Refresh docs to explain the new extension flow with the register_evaluator decorator. ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/AIQToolkit/blob/develop/docs/source/resources/contributing.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. Authors: - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) Approvers: - Yuchen Zhang (https://github.com/yczhang-nv) URL: NVIDIA#358
This PR introduces a new BaseEvaluator abstract class to centralize common evaluator logic and updates the documentation and example code to use it. - Extract concurrency, progress bar, and average-score computation into BaseEvaluator. - Simplify the similarity evaluator example to subclass BaseEvaluator. - Refresh docs to explain the new extension flow with the register_evaluator decorator. ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/AIQToolkit/blob/develop/docs/source/resources/contributing.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. Authors: - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) Approvers: - Yuchen Zhang (https://github.com/yczhang-nv) URL: NVIDIA#358
Description
This PR introduces a new BaseEvaluator abstract class to centralize common evaluator logic and updates the documentation and example code to use it.
By Submitting this PR I confirm: