KEMBAR78
Intermediate steps: evaluation fix by titericz · Pull Request #312 · NVIDIA/NeMo-Agent-Toolkit · GitHub
Skip to content

Conversation

@titericz
Copy link
Contributor

@titericz titericz commented May 22, 2025

Description

This PR enhances intermediate step processing by including LLM outputs in context and adds a max_concurrency configuration for ragas evaluation.

  • Pass max_concurrency from builder to RAGEvaluator and use it in the Ragas RunConfig
  • Extend get_context to filter and label each intermediate step output
  • Include LLM outputs in get_agent_actions

By Submitting this PR I confirm:

  • I am familiar with the Contributing Guidelines.
  • We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
    • Any contribution which contains commits that are not Signed-Off will not be accepted.
  • When the PR is ready for review, new or existing tests cover these changes.
  • When the PR is ready for review, the documentation is up to date with these changes.

titericz added 3 commits May 22, 2025 16:16
Add event_filter to intermediate steps get_context

Signed-off-by: Gilberto Titericz Junior <titericz@yahoo.com>
Add even_type = [IntermediateStepType.TOOL_END, IntermediateStepType.LLM_END, IntermediateStepType.CUSTOM_END]

Signed-off-by: Gilberto Titericz Junior <titericz@yahoo.com>
add reference to trajectory evaluation

Signed-off-by: Gilberto Titericz Junior <titericz@yahoo.com>
@copy-pr-bot
Copy link

copy-pr-bot bot commented May 22, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@AnuradhaKaruppiah AnuradhaKaruppiah added improvement Improvement to existing functionality non-breaking Non-breaking change labels May 22, 2025
@AnuradhaKaruppiah
Copy link
Contributor

ok to test /b372a2d

@AnuradhaKaruppiah AnuradhaKaruppiah requested a review from Copilot May 22, 2025 20:42
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refines intermediate steps in metrics evaluation by including LLM outputs and a new CUSTOM_END event, and it adds a reference answer parameter for trajectory evaluation.

  • Includes CUSTOM_END in event filters for both trajectory and RAG evaluators.
  • Passes a new reference answer to the trajectory evaluator chain.
  • Updates the get_context method in IntermediateStepAdapter to support custom events and formatted outputs.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
src/aiq/eval/trajectory_evaluator/evaluate.py Updates event_filter and adds a reference answer to evaluation.
src/aiq/eval/rag_evaluator/evaluate.py Updates event_filter to include CUSTOM_END for context extraction.
src/aiq/eval/intermediate_step_adapter.py Modifies get_context signature to accept event_filter and formats context outputs.

@AnuradhaKaruppiah
Copy link
Contributor

/ok to test b372a2d

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
@AnuradhaKaruppiah
Copy link
Contributor

Thanks for the PR @titericz. I tried running the trajectory evaluator with the change and I am seeing this warning -

valuating Ragas nv_context_relevance:   0%|                                                                                                                                                                                | 0/5 [00:00<?, ?it/s/home/devcontainers/dev/forks/agentiq/.venv/lib/python3.12/site-packages/langchain/evaluation/schema.py:130: UserWarning: Ignoring reference in TrajectoryEvalChain, as it is not expected.
  warn(self._skip_reference_warning)
/home/devcontainers/dev/forks/agentiq/.venv/lib/python3.12/site-packages/langchain/evaluation/schema.py:130: UserWarning: Ignoring reference in TrajectoryEvalChain, as it is not expected.                                 | 0/5 [00:00<?, ?it/s]
  warn(self._skip_reference_warning)
/home/devcontainers/dev/forks/agentiq/.venv/lib/python3.12/site-packages/langchain/evaluation/schema.py:130: UserWarning: Ignoring reference in TrajectoryEvalChain, as it is not expected.
  warn(self._skip_reference_warning)
/home/devcontainers/dev/forks/agentiq/.venv/lib/python3.12/site-packages/langchain/evaluation/schema.py:130: UserWarning: Ignoring reference in TrajectoryEvalChain, as it is not expected.
  warn(self._skip_reference_warning)
/home/devcontainers/dev/forks/agentiq/.venv/lib/python3.12/site-packages/langchain/evaluation/schema.py:130: UserWarning: Ignoring reference in TrajectoryEvalChain, as it is not expected.
  warn(self._skip_reference_warning)

@AnuradhaKaruppiah
Copy link
Contributor

/ok to test b96a856

titericz added 8 commits May 23, 2025 11:22
Drop reference parameter by now as it raises a warning message

Signed-off-by: Gilberto Titericz Junior <titericz@yahoo.com>
remove trajectory evaluation changes in the PR

Signed-off-by: Gilberto Titericz Junior <titericz@yahoo.com>
Add max_concurrency to RAGAS evaluator function to avoid getting errors due high LLM backend usage.

Signed-off-by: Gilberto Titericz Junior <titericz@yahoo.com>
Signed-off-by: Gilberto Titericz Junior <titericz@yahoo.com>
max_concurrency=1 to be more conservative, I'm getting a lot of evaluation errors with max_concurrency=8. If necessary user can set max_concurrency >1 in config.yalm

Signed-off-by: Gilberto Titericz Junior <titericz@yahoo.com>
added max_concurrency to RAGAS metrics

Signed-off-by: Gilberto Titericz Junior <titericz@yahoo.com>
Signed-off-by: Gilberto Titericz Junior <titericz@yahoo.com>
Signed-off-by: Gilberto Titericz Junior <titericz@yahoo.com>
@AnuradhaKaruppiah
Copy link
Contributor

/ok to test 5f6c660

@AnuradhaKaruppiah AnuradhaKaruppiah requested a review from Copilot May 25, 2025 15:47
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances intermediate step processing by including LLM outputs in context and adds a max_concurrency configuration along with reference answers for trajectory evaluation.

  • Pass max_concurrency from builder to RAGEvaluator and use it in the Ragas RunConfig
  • Extend get_context to filter and label each intermediate step output
  • Include LLM outputs in get_agent_actions

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
src/aiq/eval/rag_evaluator/register.py Added max_concurrency argument to RAGEvaluator
src/aiq/eval/rag_evaluator/evaluate.py Updated evaluator init, imported and used RunConfig
src/aiq/eval/intermediate_step_adapter.py Extended get_agent_actions and get_context logic
Comments suppressed due to low confidence (3)

src/aiq/eval/intermediate_step_adapter.py:82

  • Passing an empty string to get_agent_action_single for last_llm_end_step will cause attribute errors; use None instead of "" when there is no previous step.
if step.event_type == IntermediateStepType.LLM_END:

src/aiq/eval/intermediate_step_adapter.py:94

  • [nitpick] The variable agent_actions holds context strings rather than actions; consider renaming to contexts for clarity.
agent_actions = []

src/aiq/eval/rag_evaluator/evaluate.py:132

  • There’s no test covering the new max_concurrency path in evaluate; consider adding a unit test that verifies RunConfig is called with the builder-provided concurrency value.
run_config=RunConfig(max_workers=self.max_concurrency),

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
@AnuradhaKaruppiah
Copy link
Contributor

/ok to test 52239b4

@AnuradhaKaruppiah
Copy link
Contributor

/merge

@rapids-bot rapids-bot bot merged commit 22a11ec into NVIDIA:develop May 25, 2025
12 checks passed
gfreeman-nvidia pushed a commit to gfreeman-nvidia/AIQToolkit that referenced this pull request May 30, 2025
This PR enhances intermediate step processing by including LLM outputs in context and adds a max_concurrency configuration for ragas evaluation.

- Pass max_concurrency from builder to RAGEvaluator and use it in the Ragas RunConfig
- Extend get_context to filter and label each intermediate step output
- Include LLM outputs in get_agent_actions

## By Submitting this PR I confirm:
- I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/AIQToolkit/blob/develop/docs/source/resources/contributing.md).
- We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
  - Any contribution which contains commits that are not Signed-Off will not be accepted.
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.

Authors:
  - Gilberto Titericz Junior (https://github.com/titericz)
  - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah)

Approvers:
  - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah)

URL: NVIDIA#312
Signed-off-by: Greg Freeman <gfreeman@nvidia.com>
ericevans-nv pushed a commit to ericevans-nv/agent-iq that referenced this pull request Jun 3, 2025
This PR enhances intermediate step processing by including LLM outputs in context and adds a max_concurrency configuration for ragas evaluation.

- Pass max_concurrency from builder to RAGEvaluator and use it in the Ragas RunConfig
- Extend get_context to filter and label each intermediate step output
- Include LLM outputs in get_agent_actions

## By Submitting this PR I confirm:
- I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/AIQToolkit/blob/develop/docs/source/resources/contributing.md).
- We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
  - Any contribution which contains commits that are not Signed-Off will not be accepted.
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.

Authors:
  - Gilberto Titericz Junior (https://github.com/titericz)
  - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah)

Approvers:
  - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah)

URL: NVIDIA#312
Signed-off-by: Eric Evans <194135482+ericevans-nv@users.noreply.github.com>
ericevans-nv pushed a commit to ericevans-nv/agent-iq that referenced this pull request Jun 3, 2025
This PR enhances intermediate step processing by including LLM outputs in context and adds a max_concurrency configuration for ragas evaluation.

- Pass max_concurrency from builder to RAGEvaluator and use it in the Ragas RunConfig
- Extend get_context to filter and label each intermediate step output
- Include LLM outputs in get_agent_actions

## By Submitting this PR I confirm:
- I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/AIQToolkit/blob/develop/docs/source/resources/contributing.md).
- We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
  - Any contribution which contains commits that are not Signed-Off will not be accepted.
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.

Authors:
  - Gilberto Titericz Junior (https://github.com/titericz)
  - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah)

Approvers:
  - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah)

URL: NVIDIA#312
Signed-off-by: Eric Evans <194135482+ericevans-nv@users.noreply.github.com>
ericevans-nv pushed a commit to ericevans-nv/agent-iq that referenced this pull request Jun 3, 2025
This PR enhances intermediate step processing by including LLM outputs in context and adds a max_concurrency configuration for ragas evaluation.

- Pass max_concurrency from builder to RAGEvaluator and use it in the Ragas RunConfig
- Extend get_context to filter and label each intermediate step output
- Include LLM outputs in get_agent_actions

## By Submitting this PR I confirm:
- I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/AIQToolkit/blob/develop/docs/source/resources/contributing.md).
- We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
  - Any contribution which contains commits that are not Signed-Off will not be accepted.
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.

Authors:
  - Gilberto Titericz Junior (https://github.com/titericz)
  - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah)

Approvers:
  - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah)

URL: NVIDIA#312
Signed-off-by: Eric Evans <194135482+ericevans-nv@users.noreply.github.com>
AnuradhaKaruppiah pushed a commit to AnuradhaKaruppiah/oss-agentiq that referenced this pull request Aug 4, 2025
This PR enhances intermediate step processing by including LLM outputs in context and adds a max_concurrency configuration for ragas evaluation.

- Pass max_concurrency from builder to RAGEvaluator and use it in the Ragas RunConfig
- Extend get_context to filter and label each intermediate step output
- Include LLM outputs in get_agent_actions



## By Submitting this PR I confirm:
- I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/AIQToolkit/blob/develop/docs/source/resources/contributing.md).
- We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
  - Any contribution which contains commits that are not Signed-Off will not be accepted.
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.

Authors:
  - Gilberto Titericz Junior (https://github.com/titericz)
  - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah)

Approvers:
  - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah)

URL: NVIDIA#312
scheckerNV pushed a commit to scheckerNV/aiq-factory-reset that referenced this pull request Aug 22, 2025
This PR enhances intermediate step processing by including LLM outputs in context and adds a max_concurrency configuration for ragas evaluation.

- Pass max_concurrency from builder to RAGEvaluator and use it in the Ragas RunConfig
- Extend get_context to filter and label each intermediate step output
- Include LLM outputs in get_agent_actions



## By Submitting this PR I confirm:
- I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/AIQToolkit/blob/develop/docs/source/resources/contributing.md).
- We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
  - Any contribution which contains commits that are not Signed-Off will not be accepted.
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.

Authors:
  - Gilberto Titericz Junior (https://github.com/titericz)
  - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah)

Approvers:
  - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah)

URL: NVIDIA#312
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improvement to existing functionality non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants