KEMBAR78
Add override option to the eval CLI command by Hritik003 · Pull Request #129 · NVIDIA/NeMo-Agent-Toolkit · GitHub
Skip to content

Conversation

@Hritik003
Copy link
Contributor

@Hritik003 Hritik003 commented Apr 14, 2025

Description

Closes Issue #78

Changes

Currently AgentIQ allows you to override options in the config file for the aiq run command, and now with this change we can similarly run the eval command with the override options.

cc: @AnuradhaKaruppiah

Test

aiq eval --config_file examples/simple/configs/eval_config.yml \                            
  --override llms.nim_llm.temperature 0.7 \
  --override llms.nim_llm.model_name meta/llama-3.3-70b-instruct
Response
2025-04-14 22:59:35,964 - aiq.cli.cli_utils.config_override - INFO - Successfully set override for llms.nim_llm.temperature with value: 0.7 with type <class 'float'>)
2025-04-14 22:59:35,964 - aiq.cli.cli_utils.config_override - INFO - Successfully set override for llms.nim_llm.model_name with value: meta/llama-3.3-70b-instruct with type <class 'str'>)
2025-04-14 22:59:35,968 - aiq.cli.cli_utils.config_override - INFO - 

Configuration after overrides:

embedders:
  nv-embedqa-e5-v5:
    _type: nim
    model_name: nvidia/nv-embedqa-e5-v5
eval:
  evaluators:
    rag_accuracy:
      _type: ragas
      llm_name: nim_rag_eval_llm
      metric: AnswerAccuracy
    rag_groundedness:
      _type: ragas
      llm_name: nim_rag_eval_llm
      metric: ResponseGroundedness
    rag_relevance:
      _type: ragas
      llm_name: nim_rag_eval_llm
      metric: ContextRelevance
    trajectory_accuracy:
      _type: trajectory
      llm_name: nim_trajectory_eval_llm
  general:
    dataset:
      _type: json
      file_path: examples/simple/data/langsmith.json
    output:
      cleanup: true
      dir: ./.tmp/aiq/examples/simple/
    profiler:
      bottleneck_analysis:
        enable_nested_stack: true
      compute_llm_metrics: true
      concurrency_spike_analysis:
        enable: true
        spike_threshold: 7
      csv_exclude_io_text: true
      prompt_caching_prefixes:
        enable: true
        min_frequency: 0.1
      token_uniqueness_forecast: true
      workflow_runtime_forecast: true
functions:
  current_datetime:
    _type: current_datetime
general:
  use_uvloop: true
llms:
  nim_llm:
    _type: nim
    model_name: meta/llama-3.3-70b-instruct
    temperature: 0.7
  nim_rag_eval_llm:
    _type: nim
    max_tokens: 2
    model_name: meta/llama-3.3-70b-instruct
    temperature: 1.0e-07
    top_p: 0.0001
  nim_trajectory_eval_llm:
    _type: nim
    max_tokens: 1024
    model_name: meta/llama-3.1-70b-instruct
    temperature: 0.0
workflow:
  _type: react_agent
  llm_name: nim_llm
  max_retries: 3
  retry_parsing_errors: true
  tool_names:
  - current_datetime
  verbose: true

2025-04-14 22:59:36,035 - aiq.eval.evaluate - INFO - Starting evaluation run with config file: examples/simple/configs/eval_config.yml
2025-04-14 22:59:36,043 - aiq.eval.evaluate - INFO - Cleaning up output directory .tmp/aiq/examples/simple
2025-04-14 22:59:36,184 - aiq.profiler.decorators - INFO - Langchain callback handler registered
2025-04-14 22:59:36,470 - aiq.agent.react_agent.agent - INFO - Filling the prompt variables "tools" and "tool_names", using the tools provided in the config.
2025-04-14 22:59:36,470 - aiq.agent.react_agent.agent - INFO - Adding the tools' input schema to the tools' description
2025-04-14 22:59:36,470 - aiq.agent.react_agent.agent - INFO - Initialized ReAct Agent Graph
2025-04-14 22:59:36,473 - aiq.agent.react_agent.agent - INFO - ReAct Graph built and compiled successfully
Running workflow:   0%|                                                                                                           | 0/3 [00:00<?, ?it/s]2025-04-
.......................
The agent's thoughts are:
Thought: Since I don't have the specific tool to search for Langsmith documentation and tutorials, I'll try to provide a general answer based on my knowledge.

Langsmith is a platform that allows users to create and test conversational interfaces. To prototype with Langsmith, you can start by creating a new project and defining the conversational flow using their visual interface. You can then add intents, entities, and responses to create a functional conversational interface. Langsmith also provides features like testing and analytics to help you refine your prototype.

Final Answer: To prototype with Langsmith, create a new project, define the conversational flow, add intents, entities, and responses, and use testing and analytics features to refine your prototype.
2025-04-14 22:59:42,047 - aiq.observability.async_otel_listener - INFO - Intermediate step stream completed. No more events will arrive.
Running workflow: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:04<00:00,  1.66s/it]
Evaluating Ragas nv_accuracy:   0%|                                                                                               | 0/3 [00:00<?, ?it/s2025-04-14 22:59:43,516 - aiq.eval.trajectory_evaluator.evaluate - INFO - Running trajectory evaluation with 3 records             | 0/3 [00:00<?, ?it/s]
Evaluating Ragas nv_context_relevance: 100%|██████████████████████████████████████████████████████████████████████████████| 3/3 [00:01<00:00,  1.72it/s]
Evaluating Ragas nv_response_groundedness: 100%|██████████████████████████████████████████████████████████████████████████| 3/3 [00:02<00:00,  1.06it/s]
Evaluating Ragas nv_accuracy: 100%|███████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:07<00:00,  2.50s/it]
Evaluating Trajectory: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:06<00:00,  2.07s/it]
2025-04-14 22:59:49,774 - aiq.profiler.profile_runner - INFO - Wrote combined data to: .tmp/aiq/examples/simple/all_requests_profiler_traces.json
2025-04-14 22:59:49,815 - aiq.profiler.profile_runner - INFO - Wrote merged standardized DataFrame to .tmp/aiq/examples/simple/standardized_data_all.csv
2025-04-14 22:59:49,835 - aiq.profiler.profile_runner - INFO - Wrote inference optimization results to: .tmp/aiq/examples/simple/inference_optimization.json
2025-04-14 22:59:50,271 - aiq.profiler.profile_runner - INFO - Nested stack analysis complete
2025-04-14 22:59:50,281 - aiq.profiler.profile_runner - INFO - Concurrency spike analysis complete
2025-04-14 22:59:50,281 - aiq.profiler.profile_runner - INFO - Wrote workflow profiling report to: .tmp/aiq/examples/simple/workflow_profiling_report.txt
2025-04-14 22:59:50,281 - aiq.profiler.profile_runner - INFO - Wrote workflow profiling metrics to: .tmp/aiq/examples/simple/workflow_profiling_metrics.json
2025-04-14 22:59:50,283 - aiq.eval.evaluate - INFO - Workflow output written to 
2025-04-14 22:59:50,283 - aiq.eval.utils.output_uploader - INFO - No S3 config provided; skipping upload.

By Submitting this PR I confirm:

  • I am familiar with the Contributing Guidelines.
  • We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
    • Any contribution which contains commits that are not Signed-Off will not be accepted.
  • When the PR is ready for review, new or existing tests cover these changes.
  • When the PR is ready for review, the documentation is up to date with these changes.

Signed-off-by: Hritik003 <hritik.raj@nutanix.com>
@Hritik003 Hritik003 requested a review from a team as a code owner April 14, 2025 17:31
@copy-pr-bot
Copy link

copy-pr-bot bot commented Apr 14, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@Hritik003
Copy link
Contributor Author

@AnuradhaKaruppiah any changes required on this PR?

This is for consistency with the start commands

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
@AnuradhaKaruppiah
Copy link
Contributor

LGTM. Thanks for the contribution @Hritik003

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request adds the ability to override configuration options when executing the eval CLI command. Key changes include:

  • Introducing a new function in the evaluation module to load, override, and validate the configuration.
  • Updating the evaluation run configuration type to include an override tuple.
  • Modifying the CLI command to accept and pass override values, with corresponding documentation updates.

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
src/aiq/eval/evaluate.py Added apply_overrides to load and validate overriden config options.
src/aiq/eval/config.py Updated the EvaluationRunConfig to include an override parameter.
src/aiq/cli/commands/evaluate.py Extended the CLI command to accept override options and pass them.
docs/source/guides/evaluate.md Documented the new override flag usage with an example.
Comments suppressed due to low confidence (1)

src/aiq/eval/evaluate.py:227

  • [nitpick] Consider renaming 'apply_overrides' to 'load_and_apply_overrides' to better reflect that the method both loads the configuration and applies overrides, improving clarity.
def apply_overrides(self):

@AnuradhaKaruppiah
Copy link
Contributor

/ok to test

@copy-pr-bot
Copy link

copy-pr-bot bot commented Apr 22, 2025

/ok to test

@AnuradhaKaruppiah, there was an error processing your request: E1

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/

@AnuradhaKaruppiah
Copy link
Contributor

/ok to test 68e2321

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
@AnuradhaKaruppiah
Copy link
Contributor

/ok to test fd53dee

@AnuradhaKaruppiah AnuradhaKaruppiah added feature request New feature or request non-breaking Non-breaking change labels Apr 22, 2025
@AnuradhaKaruppiah
Copy link
Contributor

/merge

@rapids-bot rapids-bot bot merged commit c9f8f62 into NVIDIA:develop Apr 22, 2025
10 checks passed
ericevans-nv pushed a commit to ericevans-nv/agent-iq that referenced this pull request Apr 23, 2025
Closes Issue NVIDIA#78

## Changes
Currently AgentIQ allows you to override options in the config file for the aiq run command, and now with this change we can similarly run the eval command with the override options.

cc: @AnuradhaKaruppiah

## Test

```
aiq eval --config_file examples/simple/configs/eval_config.yml \
  --override llms.nim_llm.temperature 0.7 \
  --override llms.nim_llm.model_name meta/llama-3.3-70b-instruct
```

<details>
<summary>Response</summary>

```
2025-04-14 22:59:35,964 - aiq.cli.cli_utils.config_override - INFO - Successfully set override for llms.nim_llm.temperature with value: 0.7 with type <class 'float'>)
2025-04-14 22:59:35,964 - aiq.cli.cli_utils.config_override - INFO - Successfully set override for llms.nim_llm.model_name with value: meta/llama-3.3-70b-instruct with type <class 'str'>)
2025-04-14 22:59:35,968 - aiq.cli.cli_utils.config_override - INFO -

Configuration after overrides:

embedders:
  nv-embedqa-e5-v5:
    _type: nim
    model_name: nvidia/nv-embedqa-e5-v5
eval:
  evaluators:
    rag_accuracy:
      _type: ragas
      llm_name: nim_rag_eval_llm
      metric: AnswerAccuracy
    rag_groundedness:
      _type: ragas
      llm_name: nim_rag_eval_llm
      metric: ResponseGroundedness
    rag_relevance:
      _type: ragas
      llm_name: nim_rag_eval_llm
      metric: ContextRelevance
    trajectory_accuracy:
      _type: trajectory
      llm_name: nim_trajectory_eval_llm
  general:
    dataset:
      _type: json
      file_path: examples/simple/data/langsmith.json
    output:
      cleanup: true
      dir: ./.tmp/aiq/examples/simple/
    profiler:
      bottleneck_analysis:
        enable_nested_stack: true
      compute_llm_metrics: true
      concurrency_spike_analysis:
        enable: true
        spike_threshold: 7
      csv_exclude_io_text: true
      prompt_caching_prefixes:
        enable: true
        min_frequency: 0.1
      token_uniqueness_forecast: true
      workflow_runtime_forecast: true
functions:
  current_datetime:
    _type: current_datetime
general:
  use_uvloop: true
llms:
  nim_llm:
    _type: nim
    model_name: meta/llama-3.3-70b-instruct
    temperature: 0.7
  nim_rag_eval_llm:
    _type: nim
    max_tokens: 2
    model_name: meta/llama-3.3-70b-instruct
    temperature: 1.0e-07
    top_p: 0.0001
  nim_trajectory_eval_llm:
    _type: nim
    max_tokens: 1024
    model_name: meta/llama-3.1-70b-instruct
    temperature: 0.0
workflow:
  _type: react_agent
  llm_name: nim_llm
  max_retries: 3
  retry_parsing_errors: true
  tool_names:
  - current_datetime
  verbose: true

2025-04-14 22:59:36,035 - aiq.eval.evaluate - INFO - Starting evaluation run with config file: examples/simple/configs/eval_config.yml
2025-04-14 22:59:36,043 - aiq.eval.evaluate - INFO - Cleaning up output directory .tmp/aiq/examples/simple
2025-04-14 22:59:36,184 - aiq.profiler.decorators - INFO - Langchain callback handler registered
2025-04-14 22:59:36,470 - aiq.agent.react_agent.agent - INFO - Filling the prompt variables "tools" and "tool_names", using the tools provided in the config.
2025-04-14 22:59:36,470 - aiq.agent.react_agent.agent - INFO - Adding the tools' input schema to the tools' description
2025-04-14 22:59:36,470 - aiq.agent.react_agent.agent - INFO - Initialized ReAct Agent Graph
2025-04-14 22:59:36,473 - aiq.agent.react_agent.agent - INFO - ReAct Graph built and compiled successfully
Running workflow:   0%|                                                                                                           | 0/3 [00:00<?, ?it/s]2025-04-
.......................
The agent's thoughts are:
Thought: Since I don't have the specific tool to search for Langsmith documentation and tutorials, I'll try to provide a general answer based on my knowledge.

Langsmith is a platform that allows users to create and test conversational interfaces. To prototype with Langsmith, you can start by creating a new project and defining the conversational flow using their visual interface. You can then add intents, entities, and responses to create a functional conversational interface. Langsmith also provides features like testing and analytics to help you refine your prototype.

Final Answer: To prototype with Langsmith, create a new project, define the conversational flow, add intents, entities, and responses, and use testing and analytics features to refine your prototype.
2025-04-14 22:59:42,047 - aiq.observability.async_otel_listener - INFO - Intermediate step stream completed. No more events will arrive.
Running workflow: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:04<00:00,  1.66s/it]
Evaluating Ragas nv_accuracy:   0%|                                                                                               | 0/3 [00:00<?, ?it/s2025-04-14 22:59:43,516 - aiq.eval.trajectory_evaluator.evaluate - INFO - Running trajectory evaluation with 3 records             | 0/3 [00:00<?, ?it/s]
Evaluating Ragas nv_context_relevance: 100%|██████████████████████████████████████████████████████████████████████████████| 3/3 [00:01<00:00,  1.72it/s]
Evaluating Ragas nv_response_groundedness: 100%|██████████████████████████████████████████████████████████████████████████| 3/3 [00:02<00:00,  1.06it/s]
Evaluating Ragas nv_accuracy: 100%|███████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:07<00:00,  2.50s/it]
Evaluating Trajectory: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:06<00:00,  2.07s/it]
2025-04-14 22:59:49,774 - aiq.profiler.profile_runner - INFO - Wrote combined data to: .tmp/aiq/examples/simple/all_requests_profiler_traces.json
2025-04-14 22:59:49,815 - aiq.profiler.profile_runner - INFO - Wrote merged standardized DataFrame to .tmp/aiq/examples/simple/standardized_data_all.csv
2025-04-14 22:59:49,835 - aiq.profiler.profile_runner - INFO - Wrote inference optimization results to: .tmp/aiq/examples/simple/inference_optimization.json
2025-04-14 22:59:50,271 - aiq.profiler.profile_runner - INFO - Nested stack analysis complete
2025-04-14 22:59:50,281 - aiq.profiler.profile_runner - INFO - Concurrency spike analysis complete
2025-04-14 22:59:50,281 - aiq.profiler.profile_runner - INFO - Wrote workflow profiling report to: .tmp/aiq/examples/simple/workflow_profiling_report.txt
2025-04-14 22:59:50,281 - aiq.profiler.profile_runner - INFO - Wrote workflow profiling metrics to: .tmp/aiq/examples/simple/workflow_profiling_metrics.json
2025-04-14 22:59:50,283 - aiq.eval.evaluate - INFO - Workflow output written to
2025-04-14 22:59:50,283 - aiq.eval.utils.output_uploader - INFO - No S3 config provided; skipping upload.
```

</details>

## By Submitting this PR I confirm:
- I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/AgentIQ/blob/develop/docs/source/advanced/contributing.md).
- We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
  - Any contribution which contains commits that are not Signed-Off will not be accepted.
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.

Authors:
  - Hritik Raj (https://github.com/Hritik003)
  - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah)

Approvers:
  - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah)

URL: NVIDIA#129
Signed-off-by: Eric Evans <194135482+ericevans-nv@users.noreply.github.com>
ericevans-nv pushed a commit to ericevans-nv/agent-iq that referenced this pull request Apr 23, 2025
Closes Issue NVIDIA#78

## Changes
Currently AgentIQ allows you to override options in the config file for the aiq run command, and now with this change we can similarly run the eval command with the override options.

cc: @AnuradhaKaruppiah

## Test

```
aiq eval --config_file examples/simple/configs/eval_config.yml \
  --override llms.nim_llm.temperature 0.7 \
  --override llms.nim_llm.model_name meta/llama-3.3-70b-instruct
```

<details>
<summary>Response</summary>

```
2025-04-14 22:59:35,964 - aiq.cli.cli_utils.config_override - INFO - Successfully set override for llms.nim_llm.temperature with value: 0.7 with type <class 'float'>)
2025-04-14 22:59:35,964 - aiq.cli.cli_utils.config_override - INFO - Successfully set override for llms.nim_llm.model_name with value: meta/llama-3.3-70b-instruct with type <class 'str'>)
2025-04-14 22:59:35,968 - aiq.cli.cli_utils.config_override - INFO -

Configuration after overrides:

embedders:
  nv-embedqa-e5-v5:
    _type: nim
    model_name: nvidia/nv-embedqa-e5-v5
eval:
  evaluators:
    rag_accuracy:
      _type: ragas
      llm_name: nim_rag_eval_llm
      metric: AnswerAccuracy
    rag_groundedness:
      _type: ragas
      llm_name: nim_rag_eval_llm
      metric: ResponseGroundedness
    rag_relevance:
      _type: ragas
      llm_name: nim_rag_eval_llm
      metric: ContextRelevance
    trajectory_accuracy:
      _type: trajectory
      llm_name: nim_trajectory_eval_llm
  general:
    dataset:
      _type: json
      file_path: examples/simple/data/langsmith.json
    output:
      cleanup: true
      dir: ./.tmp/aiq/examples/simple/
    profiler:
      bottleneck_analysis:
        enable_nested_stack: true
      compute_llm_metrics: true
      concurrency_spike_analysis:
        enable: true
        spike_threshold: 7
      csv_exclude_io_text: true
      prompt_caching_prefixes:
        enable: true
        min_frequency: 0.1
      token_uniqueness_forecast: true
      workflow_runtime_forecast: true
functions:
  current_datetime:
    _type: current_datetime
general:
  use_uvloop: true
llms:
  nim_llm:
    _type: nim
    model_name: meta/llama-3.3-70b-instruct
    temperature: 0.7
  nim_rag_eval_llm:
    _type: nim
    max_tokens: 2
    model_name: meta/llama-3.3-70b-instruct
    temperature: 1.0e-07
    top_p: 0.0001
  nim_trajectory_eval_llm:
    _type: nim
    max_tokens: 1024
    model_name: meta/llama-3.1-70b-instruct
    temperature: 0.0
workflow:
  _type: react_agent
  llm_name: nim_llm
  max_retries: 3
  retry_parsing_errors: true
  tool_names:
  - current_datetime
  verbose: true

2025-04-14 22:59:36,035 - aiq.eval.evaluate - INFO - Starting evaluation run with config file: examples/simple/configs/eval_config.yml
2025-04-14 22:59:36,043 - aiq.eval.evaluate - INFO - Cleaning up output directory .tmp/aiq/examples/simple
2025-04-14 22:59:36,184 - aiq.profiler.decorators - INFO - Langchain callback handler registered
2025-04-14 22:59:36,470 - aiq.agent.react_agent.agent - INFO - Filling the prompt variables "tools" and "tool_names", using the tools provided in the config.
2025-04-14 22:59:36,470 - aiq.agent.react_agent.agent - INFO - Adding the tools' input schema to the tools' description
2025-04-14 22:59:36,470 - aiq.agent.react_agent.agent - INFO - Initialized ReAct Agent Graph
2025-04-14 22:59:36,473 - aiq.agent.react_agent.agent - INFO - ReAct Graph built and compiled successfully
Running workflow:   0%|                                                                                                           | 0/3 [00:00<?, ?it/s]2025-04-
.......................
The agent's thoughts are:
Thought: Since I don't have the specific tool to search for Langsmith documentation and tutorials, I'll try to provide a general answer based on my knowledge.

Langsmith is a platform that allows users to create and test conversational interfaces. To prototype with Langsmith, you can start by creating a new project and defining the conversational flow using their visual interface. You can then add intents, entities, and responses to create a functional conversational interface. Langsmith also provides features like testing and analytics to help you refine your prototype.

Final Answer: To prototype with Langsmith, create a new project, define the conversational flow, add intents, entities, and responses, and use testing and analytics features to refine your prototype.
2025-04-14 22:59:42,047 - aiq.observability.async_otel_listener - INFO - Intermediate step stream completed. No more events will arrive.
Running workflow: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:04<00:00,  1.66s/it]
Evaluating Ragas nv_accuracy:   0%|                                                                                               | 0/3 [00:00<?, ?it/s2025-04-14 22:59:43,516 - aiq.eval.trajectory_evaluator.evaluate - INFO - Running trajectory evaluation with 3 records             | 0/3 [00:00<?, ?it/s]
Evaluating Ragas nv_context_relevance: 100%|██████████████████████████████████████████████████████████████████████████████| 3/3 [00:01<00:00,  1.72it/s]
Evaluating Ragas nv_response_groundedness: 100%|██████████████████████████████████████████████████████████████████████████| 3/3 [00:02<00:00,  1.06it/s]
Evaluating Ragas nv_accuracy: 100%|███████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:07<00:00,  2.50s/it]
Evaluating Trajectory: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:06<00:00,  2.07s/it]
2025-04-14 22:59:49,774 - aiq.profiler.profile_runner - INFO - Wrote combined data to: .tmp/aiq/examples/simple/all_requests_profiler_traces.json
2025-04-14 22:59:49,815 - aiq.profiler.profile_runner - INFO - Wrote merged standardized DataFrame to .tmp/aiq/examples/simple/standardized_data_all.csv
2025-04-14 22:59:49,835 - aiq.profiler.profile_runner - INFO - Wrote inference optimization results to: .tmp/aiq/examples/simple/inference_optimization.json
2025-04-14 22:59:50,271 - aiq.profiler.profile_runner - INFO - Nested stack analysis complete
2025-04-14 22:59:50,281 - aiq.profiler.profile_runner - INFO - Concurrency spike analysis complete
2025-04-14 22:59:50,281 - aiq.profiler.profile_runner - INFO - Wrote workflow profiling report to: .tmp/aiq/examples/simple/workflow_profiling_report.txt
2025-04-14 22:59:50,281 - aiq.profiler.profile_runner - INFO - Wrote workflow profiling metrics to: .tmp/aiq/examples/simple/workflow_profiling_metrics.json
2025-04-14 22:59:50,283 - aiq.eval.evaluate - INFO - Workflow output written to
2025-04-14 22:59:50,283 - aiq.eval.utils.output_uploader - INFO - No S3 config provided; skipping upload.
```

</details>

## By Submitting this PR I confirm:
- I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/AgentIQ/blob/develop/docs/source/advanced/contributing.md).
- We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
  - Any contribution which contains commits that are not Signed-Off will not be accepted.
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.

Authors:
  - Hritik Raj (https://github.com/Hritik003)
  - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah)

Approvers:
  - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah)

URL: NVIDIA#129
Signed-off-by: Eric Evans <194135482+ericevans-nv@users.noreply.github.com>
yczhang-nv pushed a commit to yczhang-nv/NeMo-Agent-Toolkit that referenced this pull request May 8, 2025
Closes Issue NVIDIA#78

## Changes
Currently AgentIQ allows you to override options in the config file for the aiq run command, and now with this change we can similarly run the eval command with the override options.

cc: @AnuradhaKaruppiah

## Test

```
aiq eval --config_file examples/simple/configs/eval_config.yml \
  --override llms.nim_llm.temperature 0.7 \
  --override llms.nim_llm.model_name meta/llama-3.3-70b-instruct
```

<details>
<summary>Response</summary>

```
2025-04-14 22:59:35,964 - aiq.cli.cli_utils.config_override - INFO - Successfully set override for llms.nim_llm.temperature with value: 0.7 with type <class 'float'>)
2025-04-14 22:59:35,964 - aiq.cli.cli_utils.config_override - INFO - Successfully set override for llms.nim_llm.model_name with value: meta/llama-3.3-70b-instruct with type <class 'str'>)
2025-04-14 22:59:35,968 - aiq.cli.cli_utils.config_override - INFO -

Configuration after overrides:

embedders:
  nv-embedqa-e5-v5:
    _type: nim
    model_name: nvidia/nv-embedqa-e5-v5
eval:
  evaluators:
    rag_accuracy:
      _type: ragas
      llm_name: nim_rag_eval_llm
      metric: AnswerAccuracy
    rag_groundedness:
      _type: ragas
      llm_name: nim_rag_eval_llm
      metric: ResponseGroundedness
    rag_relevance:
      _type: ragas
      llm_name: nim_rag_eval_llm
      metric: ContextRelevance
    trajectory_accuracy:
      _type: trajectory
      llm_name: nim_trajectory_eval_llm
  general:
    dataset:
      _type: json
      file_path: examples/simple/data/langsmith.json
    output:
      cleanup: true
      dir: ./.tmp/aiq/examples/simple/
    profiler:
      bottleneck_analysis:
        enable_nested_stack: true
      compute_llm_metrics: true
      concurrency_spike_analysis:
        enable: true
        spike_threshold: 7
      csv_exclude_io_text: true
      prompt_caching_prefixes:
        enable: true
        min_frequency: 0.1
      token_uniqueness_forecast: true
      workflow_runtime_forecast: true
functions:
  current_datetime:
    _type: current_datetime
general:
  use_uvloop: true
llms:
  nim_llm:
    _type: nim
    model_name: meta/llama-3.3-70b-instruct
    temperature: 0.7
  nim_rag_eval_llm:
    _type: nim
    max_tokens: 2
    model_name: meta/llama-3.3-70b-instruct
    temperature: 1.0e-07
    top_p: 0.0001
  nim_trajectory_eval_llm:
    _type: nim
    max_tokens: 1024
    model_name: meta/llama-3.1-70b-instruct
    temperature: 0.0
workflow:
  _type: react_agent
  llm_name: nim_llm
  max_retries: 3
  retry_parsing_errors: true
  tool_names:
  - current_datetime
  verbose: true

2025-04-14 22:59:36,035 - aiq.eval.evaluate - INFO - Starting evaluation run with config file: examples/simple/configs/eval_config.yml
2025-04-14 22:59:36,043 - aiq.eval.evaluate - INFO - Cleaning up output directory .tmp/aiq/examples/simple
2025-04-14 22:59:36,184 - aiq.profiler.decorators - INFO - Langchain callback handler registered
2025-04-14 22:59:36,470 - aiq.agent.react_agent.agent - INFO - Filling the prompt variables "tools" and "tool_names", using the tools provided in the config.
2025-04-14 22:59:36,470 - aiq.agent.react_agent.agent - INFO - Adding the tools' input schema to the tools' description
2025-04-14 22:59:36,470 - aiq.agent.react_agent.agent - INFO - Initialized ReAct Agent Graph
2025-04-14 22:59:36,473 - aiq.agent.react_agent.agent - INFO - ReAct Graph built and compiled successfully
Running workflow:   0%|                                                                                                           | 0/3 [00:00<?, ?it/s]2025-04-
.......................
The agent's thoughts are:
Thought: Since I don't have the specific tool to search for Langsmith documentation and tutorials, I'll try to provide a general answer based on my knowledge.

Langsmith is a platform that allows users to create and test conversational interfaces. To prototype with Langsmith, you can start by creating a new project and defining the conversational flow using their visual interface. You can then add intents, entities, and responses to create a functional conversational interface. Langsmith also provides features like testing and analytics to help you refine your prototype.

Final Answer: To prototype with Langsmith, create a new project, define the conversational flow, add intents, entities, and responses, and use testing and analytics features to refine your prototype.
2025-04-14 22:59:42,047 - aiq.observability.async_otel_listener - INFO - Intermediate step stream completed. No more events will arrive.
Running workflow: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:04<00:00,  1.66s/it]
Evaluating Ragas nv_accuracy:   0%|                                                                                               | 0/3 [00:00<?, ?it/s2025-04-14 22:59:43,516 - aiq.eval.trajectory_evaluator.evaluate - INFO - Running trajectory evaluation with 3 records             | 0/3 [00:00<?, ?it/s]
Evaluating Ragas nv_context_relevance: 100%|██████████████████████████████████████████████████████████████████████████████| 3/3 [00:01<00:00,  1.72it/s]
Evaluating Ragas nv_response_groundedness: 100%|██████████████████████████████████████████████████████████████████████████| 3/3 [00:02<00:00,  1.06it/s]
Evaluating Ragas nv_accuracy: 100%|███████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:07<00:00,  2.50s/it]
Evaluating Trajectory: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:06<00:00,  2.07s/it]
2025-04-14 22:59:49,774 - aiq.profiler.profile_runner - INFO - Wrote combined data to: .tmp/aiq/examples/simple/all_requests_profiler_traces.json
2025-04-14 22:59:49,815 - aiq.profiler.profile_runner - INFO - Wrote merged standardized DataFrame to .tmp/aiq/examples/simple/standardized_data_all.csv
2025-04-14 22:59:49,835 - aiq.profiler.profile_runner - INFO - Wrote inference optimization results to: .tmp/aiq/examples/simple/inference_optimization.json
2025-04-14 22:59:50,271 - aiq.profiler.profile_runner - INFO - Nested stack analysis complete
2025-04-14 22:59:50,281 - aiq.profiler.profile_runner - INFO - Concurrency spike analysis complete
2025-04-14 22:59:50,281 - aiq.profiler.profile_runner - INFO - Wrote workflow profiling report to: .tmp/aiq/examples/simple/workflow_profiling_report.txt
2025-04-14 22:59:50,281 - aiq.profiler.profile_runner - INFO - Wrote workflow profiling metrics to: .tmp/aiq/examples/simple/workflow_profiling_metrics.json
2025-04-14 22:59:50,283 - aiq.eval.evaluate - INFO - Workflow output written to
2025-04-14 22:59:50,283 - aiq.eval.utils.output_uploader - INFO - No S3 config provided; skipping upload.
```

</details>

## By Submitting this PR I confirm:
- I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/AgentIQ/blob/develop/docs/source/advanced/contributing.md).
- We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
  - Any contribution which contains commits that are not Signed-Off will not be accepted.
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.

Authors:
  - Hritik Raj (https://github.com/Hritik003)
  - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah)

Approvers:
  - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah)

URL: NVIDIA#129
Signed-off-by: Yuchen Zhang <134643420+yczhang-nv@users.noreply.github.com>
AnuradhaKaruppiah pushed a commit to AnuradhaKaruppiah/oss-agentiq that referenced this pull request Aug 4, 2025
Closes Issue NVIDIA#78 

## Changes
Currently AgentIQ allows you to override options in the config file for the aiq run command, and now with this change we can similarly run the eval command with the override options.

cc: @AnuradhaKaruppiah 

## Test

```
aiq eval --config_file examples/simple/configs/eval_config.yml \                            
  --override llms.nim_llm.temperature 0.7 \
  --override llms.nim_llm.model_name meta/llama-3.3-70b-instruct
```

<details>
<summary>Response</summary>

```
2025-04-14 22:59:35,964 - aiq.cli.cli_utils.config_override - INFO - Successfully set override for llms.nim_llm.temperature with value: 0.7 with type <class 'float'>)
2025-04-14 22:59:35,964 - aiq.cli.cli_utils.config_override - INFO - Successfully set override for llms.nim_llm.model_name with value: meta/llama-3.3-70b-instruct with type <class 'str'>)
2025-04-14 22:59:35,968 - aiq.cli.cli_utils.config_override - INFO - 

Configuration after overrides:

embedders:
  nv-embedqa-e5-v5:
    _type: nim
    model_name: nvidia/nv-embedqa-e5-v5
eval:
  evaluators:
    rag_accuracy:
      _type: ragas
      llm_name: nim_rag_eval_llm
      metric: AnswerAccuracy
    rag_groundedness:
      _type: ragas
      llm_name: nim_rag_eval_llm
      metric: ResponseGroundedness
    rag_relevance:
      _type: ragas
      llm_name: nim_rag_eval_llm
      metric: ContextRelevance
    trajectory_accuracy:
      _type: trajectory
      llm_name: nim_trajectory_eval_llm
  general:
    dataset:
      _type: json
      file_path: examples/simple/data/langsmith.json
    output:
      cleanup: true
      dir: ./.tmp/aiq/examples/simple/
    profiler:
      bottleneck_analysis:
        enable_nested_stack: true
      compute_llm_metrics: true
      concurrency_spike_analysis:
        enable: true
        spike_threshold: 7
      csv_exclude_io_text: true
      prompt_caching_prefixes:
        enable: true
        min_frequency: 0.1
      token_uniqueness_forecast: true
      workflow_runtime_forecast: true
functions:
  current_datetime:
    _type: current_datetime
general:
  use_uvloop: true
llms:
  nim_llm:
    _type: nim
    model_name: meta/llama-3.3-70b-instruct
    temperature: 0.7
  nim_rag_eval_llm:
    _type: nim
    max_tokens: 2
    model_name: meta/llama-3.3-70b-instruct
    temperature: 1.0e-07
    top_p: 0.0001
  nim_trajectory_eval_llm:
    _type: nim
    max_tokens: 1024
    model_name: meta/llama-3.1-70b-instruct
    temperature: 0.0
workflow:
  _type: react_agent
  llm_name: nim_llm
  max_retries: 3
  retry_parsing_errors: true
  tool_names:
  - current_datetime
  verbose: true

2025-04-14 22:59:36,035 - aiq.eval.evaluate - INFO - Starting evaluation run with config file: examples/simple/configs/eval_config.yml
2025-04-14 22:59:36,043 - aiq.eval.evaluate - INFO - Cleaning up output directory .tmp/aiq/examples/simple
2025-04-14 22:59:36,184 - aiq.profiler.decorators - INFO - Langchain callback handler registered
2025-04-14 22:59:36,470 - aiq.agent.react_agent.agent - INFO - Filling the prompt variables "tools" and "tool_names", using the tools provided in the config.
2025-04-14 22:59:36,470 - aiq.agent.react_agent.agent - INFO - Adding the tools' input schema to the tools' description
2025-04-14 22:59:36,470 - aiq.agent.react_agent.agent - INFO - Initialized ReAct Agent Graph
2025-04-14 22:59:36,473 - aiq.agent.react_agent.agent - INFO - ReAct Graph built and compiled successfully
Running workflow:   0%|                                                                                                           | 0/3 [00:00<?, ?it/s]2025-04-
.......................
The agent's thoughts are:
Thought: Since I don't have the specific tool to search for Langsmith documentation and tutorials, I'll try to provide a general answer based on my knowledge.

Langsmith is a platform that allows users to create and test conversational interfaces. To prototype with Langsmith, you can start by creating a new project and defining the conversational flow using their visual interface. You can then add intents, entities, and responses to create a functional conversational interface. Langsmith also provides features like testing and analytics to help you refine your prototype.

Final Answer: To prototype with Langsmith, create a new project, define the conversational flow, add intents, entities, and responses, and use testing and analytics features to refine your prototype.
2025-04-14 22:59:42,047 - aiq.observability.async_otel_listener - INFO - Intermediate step stream completed. No more events will arrive.
Running workflow: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:04<00:00,  1.66s/it]
Evaluating Ragas nv_accuracy:   0%|                                                                                               | 0/3 [00:00<?, ?it/s2025-04-14 22:59:43,516 - aiq.eval.trajectory_evaluator.evaluate - INFO - Running trajectory evaluation with 3 records             | 0/3 [00:00<?, ?it/s]
Evaluating Ragas nv_context_relevance: 100%|██████████████████████████████████████████████████████████████████████████████| 3/3 [00:01<00:00,  1.72it/s]
Evaluating Ragas nv_response_groundedness: 100%|██████████████████████████████████████████████████████████████████████████| 3/3 [00:02<00:00,  1.06it/s]
Evaluating Ragas nv_accuracy: 100%|███████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:07<00:00,  2.50s/it]
Evaluating Trajectory: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:06<00:00,  2.07s/it]
2025-04-14 22:59:49,774 - aiq.profiler.profile_runner - INFO - Wrote combined data to: .tmp/aiq/examples/simple/all_requests_profiler_traces.json
2025-04-14 22:59:49,815 - aiq.profiler.profile_runner - INFO - Wrote merged standardized DataFrame to .tmp/aiq/examples/simple/standardized_data_all.csv
2025-04-14 22:59:49,835 - aiq.profiler.profile_runner - INFO - Wrote inference optimization results to: .tmp/aiq/examples/simple/inference_optimization.json
2025-04-14 22:59:50,271 - aiq.profiler.profile_runner - INFO - Nested stack analysis complete
2025-04-14 22:59:50,281 - aiq.profiler.profile_runner - INFO - Concurrency spike analysis complete
2025-04-14 22:59:50,281 - aiq.profiler.profile_runner - INFO - Wrote workflow profiling report to: .tmp/aiq/examples/simple/workflow_profiling_report.txt
2025-04-14 22:59:50,281 - aiq.profiler.profile_runner - INFO - Wrote workflow profiling metrics to: .tmp/aiq/examples/simple/workflow_profiling_metrics.json
2025-04-14 22:59:50,283 - aiq.eval.evaluate - INFO - Workflow output written to 
2025-04-14 22:59:50,283 - aiq.eval.utils.output_uploader - INFO - No S3 config provided; skipping upload.
```

</details>




## By Submitting this PR I confirm:
- I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/AgentIQ/blob/develop/docs/source/advanced/contributing.md).
- We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
  - Any contribution which contains commits that are not Signed-Off will not be accepted.
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.

Authors:
  - Hritik Raj (https://github.com/Hritik003)
  - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah)

Approvers:
  - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah)

URL: NVIDIA#129
scheckerNV pushed a commit to scheckerNV/aiq-factory-reset that referenced this pull request Aug 22, 2025
Closes Issue NVIDIA#78 

## Changes
Currently AgentIQ allows you to override options in the config file for the aiq run command, and now with this change we can similarly run the eval command with the override options.

cc: @AnuradhaKaruppiah 

## Test

```
aiq eval --config_file examples/simple/configs/eval_config.yml \                            
  --override llms.nim_llm.temperature 0.7 \
  --override llms.nim_llm.model_name meta/llama-3.3-70b-instruct
```

<details>
<summary>Response</summary>

```
2025-04-14 22:59:35,964 - aiq.cli.cli_utils.config_override - INFO - Successfully set override for llms.nim_llm.temperature with value: 0.7 with type <class 'float'>)
2025-04-14 22:59:35,964 - aiq.cli.cli_utils.config_override - INFO - Successfully set override for llms.nim_llm.model_name with value: meta/llama-3.3-70b-instruct with type <class 'str'>)
2025-04-14 22:59:35,968 - aiq.cli.cli_utils.config_override - INFO - 

Configuration after overrides:

embedders:
  nv-embedqa-e5-v5:
    _type: nim
    model_name: nvidia/nv-embedqa-e5-v5
eval:
  evaluators:
    rag_accuracy:
      _type: ragas
      llm_name: nim_rag_eval_llm
      metric: AnswerAccuracy
    rag_groundedness:
      _type: ragas
      llm_name: nim_rag_eval_llm
      metric: ResponseGroundedness
    rag_relevance:
      _type: ragas
      llm_name: nim_rag_eval_llm
      metric: ContextRelevance
    trajectory_accuracy:
      _type: trajectory
      llm_name: nim_trajectory_eval_llm
  general:
    dataset:
      _type: json
      file_path: examples/simple/data/langsmith.json
    output:
      cleanup: true
      dir: ./.tmp/aiq/examples/simple/
    profiler:
      bottleneck_analysis:
        enable_nested_stack: true
      compute_llm_metrics: true
      concurrency_spike_analysis:
        enable: true
        spike_threshold: 7
      csv_exclude_io_text: true
      prompt_caching_prefixes:
        enable: true
        min_frequency: 0.1
      token_uniqueness_forecast: true
      workflow_runtime_forecast: true
functions:
  current_datetime:
    _type: current_datetime
general:
  use_uvloop: true
llms:
  nim_llm:
    _type: nim
    model_name: meta/llama-3.3-70b-instruct
    temperature: 0.7
  nim_rag_eval_llm:
    _type: nim
    max_tokens: 2
    model_name: meta/llama-3.3-70b-instruct
    temperature: 1.0e-07
    top_p: 0.0001
  nim_trajectory_eval_llm:
    _type: nim
    max_tokens: 1024
    model_name: meta/llama-3.1-70b-instruct
    temperature: 0.0
workflow:
  _type: react_agent
  llm_name: nim_llm
  max_retries: 3
  retry_parsing_errors: true
  tool_names:
  - current_datetime
  verbose: true

2025-04-14 22:59:36,035 - aiq.eval.evaluate - INFO - Starting evaluation run with config file: examples/simple/configs/eval_config.yml
2025-04-14 22:59:36,043 - aiq.eval.evaluate - INFO - Cleaning up output directory .tmp/aiq/examples/simple
2025-04-14 22:59:36,184 - aiq.profiler.decorators - INFO - Langchain callback handler registered
2025-04-14 22:59:36,470 - aiq.agent.react_agent.agent - INFO - Filling the prompt variables "tools" and "tool_names", using the tools provided in the config.
2025-04-14 22:59:36,470 - aiq.agent.react_agent.agent - INFO - Adding the tools' input schema to the tools' description
2025-04-14 22:59:36,470 - aiq.agent.react_agent.agent - INFO - Initialized ReAct Agent Graph
2025-04-14 22:59:36,473 - aiq.agent.react_agent.agent - INFO - ReAct Graph built and compiled successfully
Running workflow:   0%|                                                                                                           | 0/3 [00:00<?, ?it/s]2025-04-
.......................
The agent's thoughts are:
Thought: Since I don't have the specific tool to search for Langsmith documentation and tutorials, I'll try to provide a general answer based on my knowledge.

Langsmith is a platform that allows users to create and test conversational interfaces. To prototype with Langsmith, you can start by creating a new project and defining the conversational flow using their visual interface. You can then add intents, entities, and responses to create a functional conversational interface. Langsmith also provides features like testing and analytics to help you refine your prototype.

Final Answer: To prototype with Langsmith, create a new project, define the conversational flow, add intents, entities, and responses, and use testing and analytics features to refine your prototype.
2025-04-14 22:59:42,047 - aiq.observability.async_otel_listener - INFO - Intermediate step stream completed. No more events will arrive.
Running workflow: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:04<00:00,  1.66s/it]
Evaluating Ragas nv_accuracy:   0%|                                                                                               | 0/3 [00:00<?, ?it/s2025-04-14 22:59:43,516 - aiq.eval.trajectory_evaluator.evaluate - INFO - Running trajectory evaluation with 3 records             | 0/3 [00:00<?, ?it/s]
Evaluating Ragas nv_context_relevance: 100%|██████████████████████████████████████████████████████████████████████████████| 3/3 [00:01<00:00,  1.72it/s]
Evaluating Ragas nv_response_groundedness: 100%|██████████████████████████████████████████████████████████████████████████| 3/3 [00:02<00:00,  1.06it/s]
Evaluating Ragas nv_accuracy: 100%|███████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:07<00:00,  2.50s/it]
Evaluating Trajectory: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:06<00:00,  2.07s/it]
2025-04-14 22:59:49,774 - aiq.profiler.profile_runner - INFO - Wrote combined data to: .tmp/aiq/examples/simple/all_requests_profiler_traces.json
2025-04-14 22:59:49,815 - aiq.profiler.profile_runner - INFO - Wrote merged standardized DataFrame to .tmp/aiq/examples/simple/standardized_data_all.csv
2025-04-14 22:59:49,835 - aiq.profiler.profile_runner - INFO - Wrote inference optimization results to: .tmp/aiq/examples/simple/inference_optimization.json
2025-04-14 22:59:50,271 - aiq.profiler.profile_runner - INFO - Nested stack analysis complete
2025-04-14 22:59:50,281 - aiq.profiler.profile_runner - INFO - Concurrency spike analysis complete
2025-04-14 22:59:50,281 - aiq.profiler.profile_runner - INFO - Wrote workflow profiling report to: .tmp/aiq/examples/simple/workflow_profiling_report.txt
2025-04-14 22:59:50,281 - aiq.profiler.profile_runner - INFO - Wrote workflow profiling metrics to: .tmp/aiq/examples/simple/workflow_profiling_metrics.json
2025-04-14 22:59:50,283 - aiq.eval.evaluate - INFO - Workflow output written to 
2025-04-14 22:59:50,283 - aiq.eval.utils.output_uploader - INFO - No S3 config provided; skipping upload.
```

</details>




## By Submitting this PR I confirm:
- I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/AgentIQ/blob/develop/docs/source/advanced/contributing.md).
- We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
  - Any contribution which contains commits that are not Signed-Off will not be accepted.
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.

Authors:
  - Hritik Raj (https://github.com/Hritik003)
  - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah)

Approvers:
  - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah)

URL: NVIDIA#129
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature request New feature or request non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants