-
Notifications
You must be signed in to change notification settings - Fork 396
Intermediate steps: evaluation fix #312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intermediate steps: evaluation fix #312
Conversation
Add event_filter to intermediate steps get_context Signed-off-by: Gilberto Titericz Junior <titericz@yahoo.com>
Add even_type = [IntermediateStepType.TOOL_END, IntermediateStepType.LLM_END, IntermediateStepType.CUSTOM_END] Signed-off-by: Gilberto Titericz Junior <titericz@yahoo.com>
add reference to trajectory evaluation Signed-off-by: Gilberto Titericz Junior <titericz@yahoo.com>
|
ok to test /b372a2d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refines intermediate steps in metrics evaluation by including LLM outputs and a new CUSTOM_END event, and it adds a reference answer parameter for trajectory evaluation.
- Includes CUSTOM_END in event filters for both trajectory and RAG evaluators.
- Passes a new reference answer to the trajectory evaluator chain.
- Updates the get_context method in IntermediateStepAdapter to support custom events and formatted outputs.
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| src/aiq/eval/trajectory_evaluator/evaluate.py | Updates event_filter and adds a reference answer to evaluation. |
| src/aiq/eval/rag_evaluator/evaluate.py | Updates event_filter to include CUSTOM_END for context extraction. |
| src/aiq/eval/intermediate_step_adapter.py | Modifies get_context signature to accept event_filter and formats context outputs. |
|
/ok to test b372a2d |
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
|
Thanks for the PR @titericz. I tried running the trajectory evaluator with the change and I am seeing this warning - |
|
/ok to test b96a856 |
Drop reference parameter by now as it raises a warning message Signed-off-by: Gilberto Titericz Junior <titericz@yahoo.com>
remove trajectory evaluation changes in the PR Signed-off-by: Gilberto Titericz Junior <titericz@yahoo.com>
Add max_concurrency to RAGAS evaluator function to avoid getting errors due high LLM backend usage. Signed-off-by: Gilberto Titericz Junior <titericz@yahoo.com>
Signed-off-by: Gilberto Titericz Junior <titericz@yahoo.com>
max_concurrency=1 to be more conservative, I'm getting a lot of evaluation errors with max_concurrency=8. If necessary user can set max_concurrency >1 in config.yalm Signed-off-by: Gilberto Titericz Junior <titericz@yahoo.com>
added max_concurrency to RAGAS metrics Signed-off-by: Gilberto Titericz Junior <titericz@yahoo.com>
Signed-off-by: Gilberto Titericz Junior <titericz@yahoo.com>
Signed-off-by: Gilberto Titericz Junior <titericz@yahoo.com>
|
/ok to test 5f6c660 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enhances intermediate step processing by including LLM outputs in context and adds a max_concurrency configuration along with reference answers for trajectory evaluation.
- Pass
max_concurrencyfrom builder toRAGEvaluatorand use it in the RagasRunConfig - Extend
get_contextto filter and label each intermediate step output - Include LLM outputs in
get_agent_actions
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| src/aiq/eval/rag_evaluator/register.py | Added max_concurrency argument to RAGEvaluator |
| src/aiq/eval/rag_evaluator/evaluate.py | Updated evaluator init, imported and used RunConfig |
| src/aiq/eval/intermediate_step_adapter.py | Extended get_agent_actions and get_context logic |
Comments suppressed due to low confidence (3)
src/aiq/eval/intermediate_step_adapter.py:82
- Passing an empty string to
get_agent_action_singleforlast_llm_end_stepwill cause attribute errors; useNoneinstead of""when there is no previous step.
if step.event_type == IntermediateStepType.LLM_END:
src/aiq/eval/intermediate_step_adapter.py:94
- [nitpick] The variable
agent_actionsholds context strings rather than actions; consider renaming tocontextsfor clarity.
agent_actions = []
src/aiq/eval/rag_evaluator/evaluate.py:132
- There’s no test covering the new
max_concurrencypath inevaluate; consider adding a unit test that verifiesRunConfigis called with the builder-provided concurrency value.
run_config=RunConfig(max_workers=self.max_concurrency),
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
|
/ok to test 52239b4 |
|
/merge |
This PR enhances intermediate step processing by including LLM outputs in context and adds a max_concurrency configuration for ragas evaluation. - Pass max_concurrency from builder to RAGEvaluator and use it in the Ragas RunConfig - Extend get_context to filter and label each intermediate step output - Include LLM outputs in get_agent_actions ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/AIQToolkit/blob/develop/docs/source/resources/contributing.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. Authors: - Gilberto Titericz Junior (https://github.com/titericz) - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) Approvers: - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) URL: NVIDIA#312 Signed-off-by: Greg Freeman <gfreeman@nvidia.com>
This PR enhances intermediate step processing by including LLM outputs in context and adds a max_concurrency configuration for ragas evaluation. - Pass max_concurrency from builder to RAGEvaluator and use it in the Ragas RunConfig - Extend get_context to filter and label each intermediate step output - Include LLM outputs in get_agent_actions ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/AIQToolkit/blob/develop/docs/source/resources/contributing.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. Authors: - Gilberto Titericz Junior (https://github.com/titericz) - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) Approvers: - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) URL: NVIDIA#312 Signed-off-by: Eric Evans <194135482+ericevans-nv@users.noreply.github.com>
This PR enhances intermediate step processing by including LLM outputs in context and adds a max_concurrency configuration for ragas evaluation. - Pass max_concurrency from builder to RAGEvaluator and use it in the Ragas RunConfig - Extend get_context to filter and label each intermediate step output - Include LLM outputs in get_agent_actions ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/AIQToolkit/blob/develop/docs/source/resources/contributing.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. Authors: - Gilberto Titericz Junior (https://github.com/titericz) - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) Approvers: - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) URL: NVIDIA#312 Signed-off-by: Eric Evans <194135482+ericevans-nv@users.noreply.github.com>
This PR enhances intermediate step processing by including LLM outputs in context and adds a max_concurrency configuration for ragas evaluation. - Pass max_concurrency from builder to RAGEvaluator and use it in the Ragas RunConfig - Extend get_context to filter and label each intermediate step output - Include LLM outputs in get_agent_actions ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/AIQToolkit/blob/develop/docs/source/resources/contributing.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. Authors: - Gilberto Titericz Junior (https://github.com/titericz) - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) Approvers: - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) URL: NVIDIA#312 Signed-off-by: Eric Evans <194135482+ericevans-nv@users.noreply.github.com>
This PR enhances intermediate step processing by including LLM outputs in context and adds a max_concurrency configuration for ragas evaluation. - Pass max_concurrency from builder to RAGEvaluator and use it in the Ragas RunConfig - Extend get_context to filter and label each intermediate step output - Include LLM outputs in get_agent_actions ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/AIQToolkit/blob/develop/docs/source/resources/contributing.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. Authors: - Gilberto Titericz Junior (https://github.com/titericz) - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) Approvers: - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) URL: NVIDIA#312
This PR enhances intermediate step processing by including LLM outputs in context and adds a max_concurrency configuration for ragas evaluation. - Pass max_concurrency from builder to RAGEvaluator and use it in the Ragas RunConfig - Extend get_context to filter and label each intermediate step output - Include LLM outputs in get_agent_actions ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/AIQToolkit/blob/develop/docs/source/resources/contributing.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. Authors: - Gilberto Titericz Junior (https://github.com/titericz) - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) Approvers: - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) URL: NVIDIA#312
Description
This PR enhances intermediate step processing by including LLM outputs in context and adds a max_concurrency configuration for ragas evaluation.
By Submitting this PR I confirm: