Intermediate steps: evaluation fix #312

titericz · 2025-05-22T20:32:01Z

Description

This PR enhances intermediate step processing by including LLM outputs in context and adds a max_concurrency configuration for ragas evaluation.

Pass max_concurrency from builder to RAGEvaluator and use it in the Ragas RunConfig
Extend get_context to filter and label each intermediate step output
Include LLM outputs in get_agent_actions

By Submitting this PR I confirm:

I am familiar with the Contributing Guidelines.
We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
- Any contribution which contains commits that are not Signed-Off will not be accepted.
When the PR is ready for review, new or existing tests cover these changes.
When the PR is ready for review, the documentation is up to date with these changes.

Add event_filter to intermediate steps get_context Signed-off-by: Gilberto Titericz Junior <titericz@yahoo.com>

Add even_type = [IntermediateStepType.TOOL_END, IntermediateStepType.LLM_END, IntermediateStepType.CUSTOM_END] Signed-off-by: Gilberto Titericz Junior <titericz@yahoo.com>

add reference to trajectory evaluation Signed-off-by: Gilberto Titericz Junior <titericz@yahoo.com>

copy-pr-bot · 2025-05-22T20:32:04Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

AnuradhaKaruppiah · 2025-05-22T20:42:04Z

ok to test /b372a2d

Copilot

Pull Request Overview

This PR refines intermediate steps in metrics evaluation by including LLM outputs and a new CUSTOM_END event, and it adds a reference answer parameter for trajectory evaluation.

Includes CUSTOM_END in event filters for both trajectory and RAG evaluators.
Passes a new reference answer to the trajectory evaluator chain.
Updates the get_context method in IntermediateStepAdapter to support custom events and formatted outputs.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
src/aiq/eval/trajectory_evaluator/evaluate.py	Updates event_filter and adds a reference answer to evaluation.
src/aiq/eval/rag_evaluator/evaluate.py	Updates event_filter to include CUSTOM_END for context extraction.
src/aiq/eval/intermediate_step_adapter.py	Modifies get_context signature to accept event_filter and formats context outputs.

src/aiq/eval/trajectory_evaluator/evaluate.py

src/aiq/eval/intermediate_step_adapter.py

AnuradhaKaruppiah · 2025-05-23T02:19:53Z

/ok to test b372a2d

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>

AnuradhaKaruppiah · 2025-05-23T02:33:41Z

Thanks for the PR @titericz. I tried running the trajectory evaluator with the change and I am seeing this warning -

valuating Ragas nv_context_relevance:   0%|                                                                                                                                                                                | 0/5 [00:00<?, ?it/s/home/devcontainers/dev/forks/agentiq/.venv/lib/python3.12/site-packages/langchain/evaluation/schema.py:130: UserWarning: Ignoring reference in TrajectoryEvalChain, as it is not expected.
  warn(self._skip_reference_warning)
/home/devcontainers/dev/forks/agentiq/.venv/lib/python3.12/site-packages/langchain/evaluation/schema.py:130: UserWarning: Ignoring reference in TrajectoryEvalChain, as it is not expected.                                 | 0/5 [00:00<?, ?it/s]
  warn(self._skip_reference_warning)
/home/devcontainers/dev/forks/agentiq/.venv/lib/python3.12/site-packages/langchain/evaluation/schema.py:130: UserWarning: Ignoring reference in TrajectoryEvalChain, as it is not expected.
  warn(self._skip_reference_warning)
/home/devcontainers/dev/forks/agentiq/.venv/lib/python3.12/site-packages/langchain/evaluation/schema.py:130: UserWarning: Ignoring reference in TrajectoryEvalChain, as it is not expected.
  warn(self._skip_reference_warning)
/home/devcontainers/dev/forks/agentiq/.venv/lib/python3.12/site-packages/langchain/evaluation/schema.py:130: UserWarning: Ignoring reference in TrajectoryEvalChain, as it is not expected.
  warn(self._skip_reference_warning)

AnuradhaKaruppiah · 2025-05-23T02:38:03Z

/ok to test b96a856

Drop reference parameter by now as it raises a warning message Signed-off-by: Gilberto Titericz Junior <titericz@yahoo.com>

remove trajectory evaluation changes in the PR Signed-off-by: Gilberto Titericz Junior <titericz@yahoo.com>

Add max_concurrency to RAGAS evaluator function to avoid getting errors due high LLM backend usage. Signed-off-by: Gilberto Titericz Junior <titericz@yahoo.com>

Signed-off-by: Gilberto Titericz Junior <titericz@yahoo.com>

max_concurrency=1 to be more conservative, I'm getting a lot of evaluation errors with max_concurrency=8. If necessary user can set max_concurrency >1 in config.yalm Signed-off-by: Gilberto Titericz Junior <titericz@yahoo.com>

added max_concurrency to RAGAS metrics Signed-off-by: Gilberto Titericz Junior <titericz@yahoo.com>

Signed-off-by: Gilberto Titericz Junior <titericz@yahoo.com>

AnuradhaKaruppiah · 2025-05-25T15:47:26Z

/ok to test 5f6c660

Copilot

Pull Request Overview

This PR enhances intermediate step processing by including LLM outputs in context and adds a max_concurrency configuration along with reference answers for trajectory evaluation.

Pass max_concurrency from builder to RAGEvaluator and use it in the Ragas RunConfig
Extend get_context to filter and label each intermediate step output
Include LLM outputs in get_agent_actions

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File	Description
src/aiq/eval/rag_evaluator/register.py	Added `max_concurrency` argument to `RAGEvaluator`
src/aiq/eval/rag_evaluator/evaluate.py	Updated evaluator init, imported and used `RunConfig`
src/aiq/eval/intermediate_step_adapter.py	Extended `get_agent_actions` and `get_context` logic

Comments suppressed due to low confidence (3)

src/aiq/eval/intermediate_step_adapter.py:82

Passing an empty string to get_agent_action_single for last_llm_end_step will cause attribute errors; use None instead of "" when there is no previous step.

if step.event_type == IntermediateStepType.LLM_END:

src/aiq/eval/intermediate_step_adapter.py:94

[nitpick] The variable agent_actions holds context strings rather than actions; consider renaming to contexts for clarity.

agent_actions = []

src/aiq/eval/rag_evaluator/evaluate.py:132

There’s no test covering the new max_concurrency path in evaluate; consider adding a unit test that verifies RunConfig is called with the builder-provided concurrency value.

run_config=RunConfig(max_workers=self.max_concurrency),

src/aiq/eval/intermediate_step_adapter.py

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>

AnuradhaKaruppiah · 2025-05-25T16:07:00Z

/ok to test 52239b4

AnuradhaKaruppiah · 2025-05-25T16:13:37Z

/merge

This PR enhances intermediate step processing by including LLM outputs in context and adds a max_concurrency configuration for ragas evaluation. - Pass max_concurrency from builder to RAGEvaluator and use it in the Ragas RunConfig - Extend get_context to filter and label each intermediate step output - Include LLM outputs in get_agent_actions ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/AIQToolkit/blob/develop/docs/source/resources/contributing.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. Authors: - Gilberto Titericz Junior (https://github.com/titericz) - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) Approvers: - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) URL: NVIDIA#312 Signed-off-by: Greg Freeman <gfreeman@nvidia.com>

This PR enhances intermediate step processing by including LLM outputs in context and adds a max_concurrency configuration for ragas evaluation. - Pass max_concurrency from builder to RAGEvaluator and use it in the Ragas RunConfig - Extend get_context to filter and label each intermediate step output - Include LLM outputs in get_agent_actions ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/AIQToolkit/blob/develop/docs/source/resources/contributing.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. Authors: - Gilberto Titericz Junior (https://github.com/titericz) - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) Approvers: - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) URL: NVIDIA#312 Signed-off-by: Eric Evans <194135482+ericevans-nv@users.noreply.github.com>

This PR enhances intermediate step processing by including LLM outputs in context and adds a max_concurrency configuration for ragas evaluation. - Pass max_concurrency from builder to RAGEvaluator and use it in the Ragas RunConfig - Extend get_context to filter and label each intermediate step output - Include LLM outputs in get_agent_actions ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/AIQToolkit/blob/develop/docs/source/resources/contributing.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. Authors: - Gilberto Titericz Junior (https://github.com/titericz) - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) Approvers: - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) URL: NVIDIA#312

titericz added 3 commits May 22, 2025 16:16

Update intermediate_step_adapter.py

8aab769

Add event_filter to intermediate steps get_context Signed-off-by: Gilberto Titericz Junior <titericz@yahoo.com>

Update evaluate.py

5adab49

Add even_type = [IntermediateStepType.TOOL_END, IntermediateStepType.LLM_END, IntermediateStepType.CUSTOM_END] Signed-off-by: Gilberto Titericz Junior <titericz@yahoo.com>

Update evaluate.py

b372a2d

add reference to trajectory evaluation Signed-off-by: Gilberto Titericz Junior <titericz@yahoo.com>

AnuradhaKaruppiah added improvement Improvement to existing functionality non-breaking Non-breaking change labels May 22, 2025

AnuradhaKaruppiah requested a review from Copilot May 22, 2025 20:42

Copilot AI reviewed May 22, 2025

View reviewed changes

src/aiq/eval/trajectory_evaluator/evaluate.py Outdated Show resolved Hide resolved

src/aiq/eval/trajectory_evaluator/evaluate.py Outdated Show resolved Hide resolved

src/aiq/eval/intermediate_step_adapter.py Outdated Show resolved Hide resolved

AnuradhaKaruppiah approved these changes May 23, 2025

View reviewed changes

Fix CI failures

b96a856

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>

titericz added 8 commits May 23, 2025 11:22

Update evaluate.py

3244062

Drop reference parameter by now as it raises a warning message Signed-off-by: Gilberto Titericz Junior <titericz@yahoo.com>

Update evaluate.py

c9c06c5

remove trajectory evaluation changes in the PR Signed-off-by: Gilberto Titericz Junior <titericz@yahoo.com>

Update evaluate.py

7bf56b5

Add max_concurrency to RAGAS evaluator function to avoid getting errors due high LLM backend usage. Signed-off-by: Gilberto Titericz Junior <titericz@yahoo.com>

Update evaluate.py

97a8b37

Signed-off-by: Gilberto Titericz Junior <titericz@yahoo.com>

Update evaluate.py

0e9ef9e

max_concurrency=1 to be more conservative, I'm getting a lot of evaluation errors with max_concurrency=8. If necessary user can set max_concurrency >1 in config.yalm Signed-off-by: Gilberto Titericz Junior <titericz@yahoo.com>

Update register.py

eb47a89

added max_concurrency to RAGAS metrics Signed-off-by: Gilberto Titericz Junior <titericz@yahoo.com>

Update evaluate.py

7513678

Signed-off-by: Gilberto Titericz Junior <titericz@yahoo.com>

Update evaluate.py

5f6c660

Signed-off-by: Gilberto Titericz Junior <titericz@yahoo.com>

AnuradhaKaruppiah requested a review from Copilot May 25, 2025 15:47

Copilot AI reviewed May 25, 2025

View reviewed changes

src/aiq/eval/intermediate_step_adapter.py Show resolved Hide resolved

Update tests

52239b4

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>

rapids-bot bot merged commit 22a11ec into NVIDIA:develop May 25, 2025
12 checks passed

Intermediate steps: evaluation fix #312

Intermediate steps: evaluation fix #312

Conversation

titericz commented May 22, 2025 • edited by AnuradhaKaruppiah Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

By Submitting this PR I confirm:

Uh oh!

copy-pr-bot bot commented May 22, 2025

Uh oh!

AnuradhaKaruppiah commented May 22, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AnuradhaKaruppiah commented May 23, 2025

Uh oh!

AnuradhaKaruppiah commented May 23, 2025

Uh oh!

AnuradhaKaruppiah commented May 23, 2025

Uh oh!

AnuradhaKaruppiah commented May 25, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

AnuradhaKaruppiah commented May 25, 2025

Uh oh!

AnuradhaKaruppiah commented May 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

titericz commented May 22, 2025 •

edited by AnuradhaKaruppiah

Loading