Add override option to the eval CLI command #129

Hritik003 · 2025-04-14T17:31:28Z

Description

Closes Issue #78

Changes

Currently AgentIQ allows you to override options in the config file for the aiq run command, and now with this change we can similarly run the eval command with the override options.

cc: @AnuradhaKaruppiah

Test

aiq eval --config_file examples/simple/configs/eval_config.yml \                            
  --override llms.nim_llm.temperature 0.7 \
  --override llms.nim_llm.model_name meta/llama-3.3-70b-instruct

Response

2025-04-14 22:59:35,964 - aiq.cli.cli_utils.config_override - INFO - Successfully set override for llms.nim_llm.temperature with value: 0.7 with type <class 'float'>)
2025-04-14 22:59:35,964 - aiq.cli.cli_utils.config_override - INFO - Successfully set override for llms.nim_llm.model_name with value: meta/llama-3.3-70b-instruct with type <class 'str'>)
2025-04-14 22:59:35,968 - aiq.cli.cli_utils.config_override - INFO - 

Configuration after overrides:

embedders:
  nv-embedqa-e5-v5:
    _type: nim
    model_name: nvidia/nv-embedqa-e5-v5
eval:
  evaluators:
    rag_accuracy:
      _type: ragas
      llm_name: nim_rag_eval_llm
      metric: AnswerAccuracy
    rag_groundedness:
      _type: ragas
      llm_name: nim_rag_eval_llm
      metric: ResponseGroundedness
    rag_relevance:
      _type: ragas
      llm_name: nim_rag_eval_llm
      metric: ContextRelevance
    trajectory_accuracy:
      _type: trajectory
      llm_name: nim_trajectory_eval_llm
  general:
    dataset:
      _type: json
      file_path: examples/simple/data/langsmith.json
    output:
      cleanup: true
      dir: ./.tmp/aiq/examples/simple/
    profiler:
      bottleneck_analysis:
        enable_nested_stack: true
      compute_llm_metrics: true
      concurrency_spike_analysis:
        enable: true
        spike_threshold: 7
      csv_exclude_io_text: true
      prompt_caching_prefixes:
        enable: true
        min_frequency: 0.1
      token_uniqueness_forecast: true
      workflow_runtime_forecast: true
functions:
  current_datetime:
    _type: current_datetime
general:
  use_uvloop: true
llms:
  nim_llm:
    _type: nim
    model_name: meta/llama-3.3-70b-instruct
    temperature: 0.7
  nim_rag_eval_llm:
    _type: nim
    max_tokens: 2
    model_name: meta/llama-3.3-70b-instruct
    temperature: 1.0e-07
    top_p: 0.0001
  nim_trajectory_eval_llm:
    _type: nim
    max_tokens: 1024
    model_name: meta/llama-3.1-70b-instruct
    temperature: 0.0
workflow:
  _type: react_agent
  llm_name: nim_llm
  max_retries: 3
  retry_parsing_errors: true
  tool_names:
  - current_datetime
  verbose: true

2025-04-14 22:59:36,035 - aiq.eval.evaluate - INFO - Starting evaluation run with config file: examples/simple/configs/eval_config.yml
2025-04-14 22:59:36,043 - aiq.eval.evaluate - INFO - Cleaning up output directory .tmp/aiq/examples/simple
2025-04-14 22:59:36,184 - aiq.profiler.decorators - INFO - Langchain callback handler registered
2025-04-14 22:59:36,470 - aiq.agent.react_agent.agent - INFO - Filling the prompt variables "tools" and "tool_names", using the tools provided in the config.
2025-04-14 22:59:36,470 - aiq.agent.react_agent.agent - INFO - Adding the tools' input schema to the tools' description
2025-04-14 22:59:36,470 - aiq.agent.react_agent.agent - INFO - Initialized ReAct Agent Graph
2025-04-14 22:59:36,473 - aiq.agent.react_agent.agent - INFO - ReAct Graph built and compiled successfully
Running workflow:   0%|                                                                                                           | 0/3 [00:00<?, ?it/s]2025-04-
.......................
The agent's thoughts are:
Thought: Since I don't have the specific tool to search for Langsmith documentation and tutorials, I'll try to provide a general answer based on my knowledge.

Langsmith is a platform that allows users to create and test conversational interfaces. To prototype with Langsmith, you can start by creating a new project and defining the conversational flow using their visual interface. You can then add intents, entities, and responses to create a functional conversational interface. Langsmith also provides features like testing and analytics to help you refine your prototype.

Final Answer: To prototype with Langsmith, create a new project, define the conversational flow, add intents, entities, and responses, and use testing and analytics features to refine your prototype.
2025-04-14 22:59:42,047 - aiq.observability.async_otel_listener - INFO - Intermediate step stream completed. No more events will arrive.
Running workflow: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:04<00:00,  1.66s/it]
Evaluating Ragas nv_accuracy:   0%|                                                                                               | 0/3 [00:00<?, ?it/s2025-04-14 22:59:43,516 - aiq.eval.trajectory_evaluator.evaluate - INFO - Running trajectory evaluation with 3 records             | 0/3 [00:00<?, ?it/s]
Evaluating Ragas nv_context_relevance: 100%|██████████████████████████████████████████████████████████████████████████████| 3/3 [00:01<00:00,  1.72it/s]
Evaluating Ragas nv_response_groundedness: 100%|██████████████████████████████████████████████████████████████████████████| 3/3 [00:02<00:00,  1.06it/s]
Evaluating Ragas nv_accuracy: 100%|███████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:07<00:00,  2.50s/it]
Evaluating Trajectory: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:06<00:00,  2.07s/it]
2025-04-14 22:59:49,774 - aiq.profiler.profile_runner - INFO - Wrote combined data to: .tmp/aiq/examples/simple/all_requests_profiler_traces.json
2025-04-14 22:59:49,815 - aiq.profiler.profile_runner - INFO - Wrote merged standardized DataFrame to .tmp/aiq/examples/simple/standardized_data_all.csv
2025-04-14 22:59:49,835 - aiq.profiler.profile_runner - INFO - Wrote inference optimization results to: .tmp/aiq/examples/simple/inference_optimization.json
2025-04-14 22:59:50,271 - aiq.profiler.profile_runner - INFO - Nested stack analysis complete
2025-04-14 22:59:50,281 - aiq.profiler.profile_runner - INFO - Concurrency spike analysis complete
2025-04-14 22:59:50,281 - aiq.profiler.profile_runner - INFO - Wrote workflow profiling report to: .tmp/aiq/examples/simple/workflow_profiling_report.txt
2025-04-14 22:59:50,281 - aiq.profiler.profile_runner - INFO - Wrote workflow profiling metrics to: .tmp/aiq/examples/simple/workflow_profiling_metrics.json
2025-04-14 22:59:50,283 - aiq.eval.evaluate - INFO - Workflow output written to 
2025-04-14 22:59:50,283 - aiq.eval.utils.output_uploader - INFO - No S3 config provided; skipping upload.

By Submitting this PR I confirm:

I am familiar with the Contributing Guidelines.
We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
- Any contribution which contains commits that are not Signed-Off will not be accepted.
When the PR is ready for review, new or existing tests cover these changes.
When the PR is ready for review, the documentation is up to date with these changes.

Signed-off-by: Hritik003 <hritik.raj@nutanix.com>

copy-pr-bot · 2025-04-14T17:31:32Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Hritik003 · 2025-04-22T04:26:43Z

@AnuradhaKaruppiah any changes required on this PR?

This is for consistency with the start commands Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>

AnuradhaKaruppiah · 2025-04-22T21:53:45Z

LGTM. Thanks for the contribution @Hritik003

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>

Copilot

Pull Request Overview

This pull request adds the ability to override configuration options when executing the eval CLI command. Key changes include:

Introducing a new function in the evaluation module to load, override, and validate the configuration.
Updating the evaluation run configuration type to include an override tuple.
Modifying the CLI command to accept and pass override values, with corresponding documentation updates.

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File	Description
src/aiq/eval/evaluate.py	Added apply_overrides to load and validate overriden config options.
src/aiq/eval/config.py	Updated the EvaluationRunConfig to include an override parameter.
src/aiq/cli/commands/evaluate.py	Extended the CLI command to accept override options and pass them.
docs/source/guides/evaluate.md	Documented the new override flag usage with an example.

Comments suppressed due to low confidence (1)

src/aiq/eval/evaluate.py:227

[nitpick] Consider renaming 'apply_overrides' to 'load_and_apply_overrides' to better reflect that the method both loads the configuration and applies overrides, improving clarity.

def apply_overrides(self):

AnuradhaKaruppiah · 2025-04-22T23:24:44Z

/ok to test

copy-pr-bot · 2025-04-22T23:24:47Z

/ok to test

@AnuradhaKaruppiah, there was an error processing your request: E1

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/

AnuradhaKaruppiah · 2025-04-22T23:25:39Z

/ok to test 68e2321

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>

AnuradhaKaruppiah · 2025-04-22T23:34:00Z

/ok to test fd53dee

AnuradhaKaruppiah · 2025-04-22T23:41:03Z

/merge

@AnuradhaKaruppiah

Closes Issue NVIDIA#78 ## Changes Currently AgentIQ allows you to override options in the config file for the aiq run command, and now with this change we can similarly run the eval command with the override options. cc: @AnuradhaKaruppiah ## Test ``` aiq eval --config_file examples/simple/configs/eval_config.yml \ --override llms.nim_llm.temperature 0.7 \ --override llms.nim_llm.model_name meta/llama-3.3-70b-instruct ``` <details> <summary>Response</summary> ``` 2025-04-14 22:59:35,964 - aiq.cli.cli_utils.config_override - INFO - Successfully set override for llms.nim_llm.temperature with value: 0.7 with type <class 'float'>) 2025-04-14 22:59:35,964 - aiq.cli.cli_utils.config_override - INFO - Successfully set override for llms.nim_llm.model_name with value: meta/llama-3.3-70b-instruct with type <class 'str'>) 2025-04-14 22:59:35,968 - aiq.cli.cli_utils.config_override - INFO - Configuration after overrides: embedders: nv-embedqa-e5-v5: _type: nim model_name: nvidia/nv-embedqa-e5-v5 eval: evaluators: rag_accuracy: _type: ragas llm_name: nim_rag_eval_llm metric: AnswerAccuracy rag_groundedness: _type: ragas llm_name: nim_rag_eval_llm metric: ResponseGroundedness rag_relevance: _type: ragas llm_name: nim_rag_eval_llm metric: ContextRelevance trajectory_accuracy: _type: trajectory llm_name: nim_trajectory_eval_llm general: dataset: _type: json file_path: examples/simple/data/langsmith.json output: cleanup: true dir: ./.tmp/aiq/examples/simple/ profiler: bottleneck_analysis: enable_nested_stack: true compute_llm_metrics: true concurrency_spike_analysis: enable: true spike_threshold: 7 csv_exclude_io_text: true prompt_caching_prefixes: enable: true min_frequency: 0.1 token_uniqueness_forecast: true workflow_runtime_forecast: true functions: current_datetime: _type: current_datetime general: use_uvloop: true llms: nim_llm: _type: nim model_name: meta/llama-3.3-70b-instruct temperature: 0.7 nim_rag_eval_llm: _type: nim max_tokens: 2 model_name: meta/llama-3.3-70b-instruct temperature: 1.0e-07 top_p: 0.0001 nim_trajectory_eval_llm: _type: nim max_tokens: 1024 model_name: meta/llama-3.1-70b-instruct temperature: 0.0 workflow: _type: react_agent llm_name: nim_llm max_retries: 3 retry_parsing_errors: true tool_names: - current_datetime verbose: true 2025-04-14 22:59:36,035 - aiq.eval.evaluate - INFO - Starting evaluation run with config file: examples/simple/configs/eval_config.yml 2025-04-14 22:59:36,043 - aiq.eval.evaluate - INFO - Cleaning up output directory .tmp/aiq/examples/simple 2025-04-14 22:59:36,184 - aiq.profiler.decorators - INFO - Langchain callback handler registered 2025-04-14 22:59:36,470 - aiq.agent.react_agent.agent - INFO - Filling the prompt variables "tools" and "tool_names", using the tools provided in the config. 2025-04-14 22:59:36,470 - aiq.agent.react_agent.agent - INFO - Adding the tools' input schema to the tools' description 2025-04-14 22:59:36,470 - aiq.agent.react_agent.agent - INFO - Initialized ReAct Agent Graph 2025-04-14 22:59:36,473 - aiq.agent.react_agent.agent - INFO - ReAct Graph built and compiled successfully Running workflow: 0%| | 0/3 [00:00<?, ?it/s]2025-04- ....................... The agent's thoughts are: Thought: Since I don't have the specific tool to search for Langsmith documentation and tutorials, I'll try to provide a general answer based on my knowledge. Langsmith is a platform that allows users to create and test conversational interfaces. To prototype with Langsmith, you can start by creating a new project and defining the conversational flow using their visual interface. You can then add intents, entities, and responses to create a functional conversational interface. Langsmith also provides features like testing and analytics to help you refine your prototype. Final Answer: To prototype with Langsmith, create a new project, define the conversational flow, add intents, entities, and responses, and use testing and analytics features to refine your prototype. 2025-04-14 22:59:42,047 - aiq.observability.async_otel_listener - INFO - Intermediate step stream completed. No more events will arrive. Running workflow: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:04<00:00, 1.66s/it] Evaluating Ragas nv_accuracy: 0%| | 0/3 [00:00<?, ?it/s2025-04-14 22:59:43,516 - aiq.eval.trajectory_evaluator.evaluate - INFO - Running trajectory evaluation with 3 records | 0/3 [00:00<?, ?it/s] Evaluating Ragas nv_context_relevance: 100%|██████████████████████████████████████████████████████████████████████████████| 3/3 [00:01<00:00, 1.72it/s] Evaluating Ragas nv_response_groundedness: 100%|██████████████████████████████████████████████████████████████████████████| 3/3 [00:02<00:00, 1.06it/s] Evaluating Ragas nv_accuracy: 100%|███████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:07<00:00, 2.50s/it] Evaluating Trajectory: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:06<00:00, 2.07s/it] 2025-04-14 22:59:49,774 - aiq.profiler.profile_runner - INFO - Wrote combined data to: .tmp/aiq/examples/simple/all_requests_profiler_traces.json 2025-04-14 22:59:49,815 - aiq.profiler.profile_runner - INFO - Wrote merged standardized DataFrame to .tmp/aiq/examples/simple/standardized_data_all.csv 2025-04-14 22:59:49,835 - aiq.profiler.profile_runner - INFO - Wrote inference optimization results to: .tmp/aiq/examples/simple/inference_optimization.json 2025-04-14 22:59:50,271 - aiq.profiler.profile_runner - INFO - Nested stack analysis complete 2025-04-14 22:59:50,281 - aiq.profiler.profile_runner - INFO - Concurrency spike analysis complete 2025-04-14 22:59:50,281 - aiq.profiler.profile_runner - INFO - Wrote workflow profiling report to: .tmp/aiq/examples/simple/workflow_profiling_report.txt 2025-04-14 22:59:50,281 - aiq.profiler.profile_runner - INFO - Wrote workflow profiling metrics to: .tmp/aiq/examples/simple/workflow_profiling_metrics.json 2025-04-14 22:59:50,283 - aiq.eval.evaluate - INFO - Workflow output written to 2025-04-14 22:59:50,283 - aiq.eval.utils.output_uploader - INFO - No S3 config provided; skipping upload. ``` </details> ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/AgentIQ/blob/develop/docs/source/advanced/contributing.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. Authors: - Hritik Raj (https://github.com/Hritik003) - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) Approvers: - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) URL: NVIDIA#129 Signed-off-by: Eric Evans <194135482+ericevans-nv@users.noreply.github.com>

@AnuradhaKaruppiah

Closes Issue NVIDIA#78 ## Changes Currently AgentIQ allows you to override options in the config file for the aiq run command, and now with this change we can similarly run the eval command with the override options. cc: @AnuradhaKaruppiah ## Test ``` aiq eval --config_file examples/simple/configs/eval_config.yml \ --override llms.nim_llm.temperature 0.7 \ --override llms.nim_llm.model_name meta/llama-3.3-70b-instruct ``` <details> <summary>Response</summary> ``` 2025-04-14 22:59:35,964 - aiq.cli.cli_utils.config_override - INFO - Successfully set override for llms.nim_llm.temperature with value: 0.7 with type <class 'float'>) 2025-04-14 22:59:35,964 - aiq.cli.cli_utils.config_override - INFO - Successfully set override for llms.nim_llm.model_name with value: meta/llama-3.3-70b-instruct with type <class 'str'>) 2025-04-14 22:59:35,968 - aiq.cli.cli_utils.config_override - INFO - Configuration after overrides: embedders: nv-embedqa-e5-v5: _type: nim model_name: nvidia/nv-embedqa-e5-v5 eval: evaluators: rag_accuracy: _type: ragas llm_name: nim_rag_eval_llm metric: AnswerAccuracy rag_groundedness: _type: ragas llm_name: nim_rag_eval_llm metric: ResponseGroundedness rag_relevance: _type: ragas llm_name: nim_rag_eval_llm metric: ContextRelevance trajectory_accuracy: _type: trajectory llm_name: nim_trajectory_eval_llm general: dataset: _type: json file_path: examples/simple/data/langsmith.json output: cleanup: true dir: ./.tmp/aiq/examples/simple/ profiler: bottleneck_analysis: enable_nested_stack: true compute_llm_metrics: true concurrency_spike_analysis: enable: true spike_threshold: 7 csv_exclude_io_text: true prompt_caching_prefixes: enable: true min_frequency: 0.1 token_uniqueness_forecast: true workflow_runtime_forecast: true functions: current_datetime: _type: current_datetime general: use_uvloop: true llms: nim_llm: _type: nim model_name: meta/llama-3.3-70b-instruct temperature: 0.7 nim_rag_eval_llm: _type: nim max_tokens: 2 model_name: meta/llama-3.3-70b-instruct temperature: 1.0e-07 top_p: 0.0001 nim_trajectory_eval_llm: _type: nim max_tokens: 1024 model_name: meta/llama-3.1-70b-instruct temperature: 0.0 workflow: _type: react_agent llm_name: nim_llm max_retries: 3 retry_parsing_errors: true tool_names: - current_datetime verbose: true 2025-04-14 22:59:36,035 - aiq.eval.evaluate - INFO - Starting evaluation run with config file: examples/simple/configs/eval_config.yml 2025-04-14 22:59:36,043 - aiq.eval.evaluate - INFO - Cleaning up output directory .tmp/aiq/examples/simple 2025-04-14 22:59:36,184 - aiq.profiler.decorators - INFO - Langchain callback handler registered 2025-04-14 22:59:36,470 - aiq.agent.react_agent.agent - INFO - Filling the prompt variables "tools" and "tool_names", using the tools provided in the config. 2025-04-14 22:59:36,470 - aiq.agent.react_agent.agent - INFO - Adding the tools' input schema to the tools' description 2025-04-14 22:59:36,470 - aiq.agent.react_agent.agent - INFO - Initialized ReAct Agent Graph 2025-04-14 22:59:36,473 - aiq.agent.react_agent.agent - INFO - ReAct Graph built and compiled successfully Running workflow: 0%| | 0/3 [00:00<?, ?it/s]2025-04- ....................... The agent's thoughts are: Thought: Since I don't have the specific tool to search for Langsmith documentation and tutorials, I'll try to provide a general answer based on my knowledge. Langsmith is a platform that allows users to create and test conversational interfaces. To prototype with Langsmith, you can start by creating a new project and defining the conversational flow using their visual interface. You can then add intents, entities, and responses to create a functional conversational interface. Langsmith also provides features like testing and analytics to help you refine your prototype. Final Answer: To prototype with Langsmith, create a new project, define the conversational flow, add intents, entities, and responses, and use testing and analytics features to refine your prototype. 2025-04-14 22:59:42,047 - aiq.observability.async_otel_listener - INFO - Intermediate step stream completed. No more events will arrive. Running workflow: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:04<00:00, 1.66s/it] Evaluating Ragas nv_accuracy: 0%| | 0/3 [00:00<?, ?it/s2025-04-14 22:59:43,516 - aiq.eval.trajectory_evaluator.evaluate - INFO - Running trajectory evaluation with 3 records | 0/3 [00:00<?, ?it/s] Evaluating Ragas nv_context_relevance: 100%|██████████████████████████████████████████████████████████████████████████████| 3/3 [00:01<00:00, 1.72it/s] Evaluating Ragas nv_response_groundedness: 100%|██████████████████████████████████████████████████████████████████████████| 3/3 [00:02<00:00, 1.06it/s] Evaluating Ragas nv_accuracy: 100%|███████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:07<00:00, 2.50s/it] Evaluating Trajectory: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:06<00:00, 2.07s/it] 2025-04-14 22:59:49,774 - aiq.profiler.profile_runner - INFO - Wrote combined data to: .tmp/aiq/examples/simple/all_requests_profiler_traces.json 2025-04-14 22:59:49,815 - aiq.profiler.profile_runner - INFO - Wrote merged standardized DataFrame to .tmp/aiq/examples/simple/standardized_data_all.csv 2025-04-14 22:59:49,835 - aiq.profiler.profile_runner - INFO - Wrote inference optimization results to: .tmp/aiq/examples/simple/inference_optimization.json 2025-04-14 22:59:50,271 - aiq.profiler.profile_runner - INFO - Nested stack analysis complete 2025-04-14 22:59:50,281 - aiq.profiler.profile_runner - INFO - Concurrency spike analysis complete 2025-04-14 22:59:50,281 - aiq.profiler.profile_runner - INFO - Wrote workflow profiling report to: .tmp/aiq/examples/simple/workflow_profiling_report.txt 2025-04-14 22:59:50,281 - aiq.profiler.profile_runner - INFO - Wrote workflow profiling metrics to: .tmp/aiq/examples/simple/workflow_profiling_metrics.json 2025-04-14 22:59:50,283 - aiq.eval.evaluate - INFO - Workflow output written to 2025-04-14 22:59:50,283 - aiq.eval.utils.output_uploader - INFO - No S3 config provided; skipping upload. ``` </details> ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/AgentIQ/blob/develop/docs/source/advanced/contributing.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. Authors: - Hritik Raj (https://github.com/Hritik003) - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) Approvers: - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) URL: NVIDIA#129 Signed-off-by: Eric Evans <194135482+ericevans-nv@users.noreply.github.com>

@AnuradhaKaruppiah

Closes Issue NVIDIA#78 ## Changes Currently AgentIQ allows you to override options in the config file for the aiq run command, and now with this change we can similarly run the eval command with the override options. cc: @AnuradhaKaruppiah ## Test ``` aiq eval --config_file examples/simple/configs/eval_config.yml \ --override llms.nim_llm.temperature 0.7 \ --override llms.nim_llm.model_name meta/llama-3.3-70b-instruct ``` <details> <summary>Response</summary> ``` 2025-04-14 22:59:35,964 - aiq.cli.cli_utils.config_override - INFO - Successfully set override for llms.nim_llm.temperature with value: 0.7 with type <class 'float'>) 2025-04-14 22:59:35,964 - aiq.cli.cli_utils.config_override - INFO - Successfully set override for llms.nim_llm.model_name with value: meta/llama-3.3-70b-instruct with type <class 'str'>) 2025-04-14 22:59:35,968 - aiq.cli.cli_utils.config_override - INFO - Configuration after overrides: embedders: nv-embedqa-e5-v5: _type: nim model_name: nvidia/nv-embedqa-e5-v5 eval: evaluators: rag_accuracy: _type: ragas llm_name: nim_rag_eval_llm metric: AnswerAccuracy rag_groundedness: _type: ragas llm_name: nim_rag_eval_llm metric: ResponseGroundedness rag_relevance: _type: ragas llm_name: nim_rag_eval_llm metric: ContextRelevance trajectory_accuracy: _type: trajectory llm_name: nim_trajectory_eval_llm general: dataset: _type: json file_path: examples/simple/data/langsmith.json output: cleanup: true dir: ./.tmp/aiq/examples/simple/ profiler: bottleneck_analysis: enable_nested_stack: true compute_llm_metrics: true concurrency_spike_analysis: enable: true spike_threshold: 7 csv_exclude_io_text: true prompt_caching_prefixes: enable: true min_frequency: 0.1 token_uniqueness_forecast: true workflow_runtime_forecast: true functions: current_datetime: _type: current_datetime general: use_uvloop: true llms: nim_llm: _type: nim model_name: meta/llama-3.3-70b-instruct temperature: 0.7 nim_rag_eval_llm: _type: nim max_tokens: 2 model_name: meta/llama-3.3-70b-instruct temperature: 1.0e-07 top_p: 0.0001 nim_trajectory_eval_llm: _type: nim max_tokens: 1024 model_name: meta/llama-3.1-70b-instruct temperature: 0.0 workflow: _type: react_agent llm_name: nim_llm max_retries: 3 retry_parsing_errors: true tool_names: - current_datetime verbose: true 2025-04-14 22:59:36,035 - aiq.eval.evaluate - INFO - Starting evaluation run with config file: examples/simple/configs/eval_config.yml 2025-04-14 22:59:36,043 - aiq.eval.evaluate - INFO - Cleaning up output directory .tmp/aiq/examples/simple 2025-04-14 22:59:36,184 - aiq.profiler.decorators - INFO - Langchain callback handler registered 2025-04-14 22:59:36,470 - aiq.agent.react_agent.agent - INFO - Filling the prompt variables "tools" and "tool_names", using the tools provided in the config. 2025-04-14 22:59:36,470 - aiq.agent.react_agent.agent - INFO - Adding the tools' input schema to the tools' description 2025-04-14 22:59:36,470 - aiq.agent.react_agent.agent - INFO - Initialized ReAct Agent Graph 2025-04-14 22:59:36,473 - aiq.agent.react_agent.agent - INFO - ReAct Graph built and compiled successfully Running workflow: 0%| | 0/3 [00:00<?, ?it/s]2025-04- ....................... The agent's thoughts are: Thought: Since I don't have the specific tool to search for Langsmith documentation and tutorials, I'll try to provide a general answer based on my knowledge. Langsmith is a platform that allows users to create and test conversational interfaces. To prototype with Langsmith, you can start by creating a new project and defining the conversational flow using their visual interface. You can then add intents, entities, and responses to create a functional conversational interface. Langsmith also provides features like testing and analytics to help you refine your prototype. Final Answer: To prototype with Langsmith, create a new project, define the conversational flow, add intents, entities, and responses, and use testing and analytics features to refine your prototype. 2025-04-14 22:59:42,047 - aiq.observability.async_otel_listener - INFO - Intermediate step stream completed. No more events will arrive. Running workflow: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:04<00:00, 1.66s/it] Evaluating Ragas nv_accuracy: 0%| | 0/3 [00:00<?, ?it/s2025-04-14 22:59:43,516 - aiq.eval.trajectory_evaluator.evaluate - INFO - Running trajectory evaluation with 3 records | 0/3 [00:00<?, ?it/s] Evaluating Ragas nv_context_relevance: 100%|██████████████████████████████████████████████████████████████████████████████| 3/3 [00:01<00:00, 1.72it/s] Evaluating Ragas nv_response_groundedness: 100%|██████████████████████████████████████████████████████████████████████████| 3/3 [00:02<00:00, 1.06it/s] Evaluating Ragas nv_accuracy: 100%|███████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:07<00:00, 2.50s/it] Evaluating Trajectory: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:06<00:00, 2.07s/it] 2025-04-14 22:59:49,774 - aiq.profiler.profile_runner - INFO - Wrote combined data to: .tmp/aiq/examples/simple/all_requests_profiler_traces.json 2025-04-14 22:59:49,815 - aiq.profiler.profile_runner - INFO - Wrote merged standardized DataFrame to .tmp/aiq/examples/simple/standardized_data_all.csv 2025-04-14 22:59:49,835 - aiq.profiler.profile_runner - INFO - Wrote inference optimization results to: .tmp/aiq/examples/simple/inference_optimization.json 2025-04-14 22:59:50,271 - aiq.profiler.profile_runner - INFO - Nested stack analysis complete 2025-04-14 22:59:50,281 - aiq.profiler.profile_runner - INFO - Concurrency spike analysis complete 2025-04-14 22:59:50,281 - aiq.profiler.profile_runner - INFO - Wrote workflow profiling report to: .tmp/aiq/examples/simple/workflow_profiling_report.txt 2025-04-14 22:59:50,281 - aiq.profiler.profile_runner - INFO - Wrote workflow profiling metrics to: .tmp/aiq/examples/simple/workflow_profiling_metrics.json 2025-04-14 22:59:50,283 - aiq.eval.evaluate - INFO - Workflow output written to 2025-04-14 22:59:50,283 - aiq.eval.utils.output_uploader - INFO - No S3 config provided; skipping upload. ``` </details> ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/AgentIQ/blob/develop/docs/source/advanced/contributing.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. Authors: - Hritik Raj (https://github.com/Hritik003) - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) Approvers: - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) URL: NVIDIA#129 Signed-off-by: Yuchen Zhang <134643420+yczhang-nv@users.noreply.github.com>

@AnuradhaKaruppiah

Closes Issue NVIDIA#78 ## Changes Currently AgentIQ allows you to override options in the config file for the aiq run command, and now with this change we can similarly run the eval command with the override options. cc: @AnuradhaKaruppiah ## Test ``` aiq eval --config_file examples/simple/configs/eval_config.yml \ --override llms.nim_llm.temperature 0.7 \ --override llms.nim_llm.model_name meta/llama-3.3-70b-instruct ``` <details> <summary>Response</summary> ``` 2025-04-14 22:59:35,964 - aiq.cli.cli_utils.config_override - INFO - Successfully set override for llms.nim_llm.temperature with value: 0.7 with type <class 'float'>) 2025-04-14 22:59:35,964 - aiq.cli.cli_utils.config_override - INFO - Successfully set override for llms.nim_llm.model_name with value: meta/llama-3.3-70b-instruct with type <class 'str'>) 2025-04-14 22:59:35,968 - aiq.cli.cli_utils.config_override - INFO - Configuration after overrides: embedders: nv-embedqa-e5-v5: _type: nim model_name: nvidia/nv-embedqa-e5-v5 eval: evaluators: rag_accuracy: _type: ragas llm_name: nim_rag_eval_llm metric: AnswerAccuracy rag_groundedness: _type: ragas llm_name: nim_rag_eval_llm metric: ResponseGroundedness rag_relevance: _type: ragas llm_name: nim_rag_eval_llm metric: ContextRelevance trajectory_accuracy: _type: trajectory llm_name: nim_trajectory_eval_llm general: dataset: _type: json file_path: examples/simple/data/langsmith.json output: cleanup: true dir: ./.tmp/aiq/examples/simple/ profiler: bottleneck_analysis: enable_nested_stack: true compute_llm_metrics: true concurrency_spike_analysis: enable: true spike_threshold: 7 csv_exclude_io_text: true prompt_caching_prefixes: enable: true min_frequency: 0.1 token_uniqueness_forecast: true workflow_runtime_forecast: true functions: current_datetime: _type: current_datetime general: use_uvloop: true llms: nim_llm: _type: nim model_name: meta/llama-3.3-70b-instruct temperature: 0.7 nim_rag_eval_llm: _type: nim max_tokens: 2 model_name: meta/llama-3.3-70b-instruct temperature: 1.0e-07 top_p: 0.0001 nim_trajectory_eval_llm: _type: nim max_tokens: 1024 model_name: meta/llama-3.1-70b-instruct temperature: 0.0 workflow: _type: react_agent llm_name: nim_llm max_retries: 3 retry_parsing_errors: true tool_names: - current_datetime verbose: true 2025-04-14 22:59:36,035 - aiq.eval.evaluate - INFO - Starting evaluation run with config file: examples/simple/configs/eval_config.yml 2025-04-14 22:59:36,043 - aiq.eval.evaluate - INFO - Cleaning up output directory .tmp/aiq/examples/simple 2025-04-14 22:59:36,184 - aiq.profiler.decorators - INFO - Langchain callback handler registered 2025-04-14 22:59:36,470 - aiq.agent.react_agent.agent - INFO - Filling the prompt variables "tools" and "tool_names", using the tools provided in the config. 2025-04-14 22:59:36,470 - aiq.agent.react_agent.agent - INFO - Adding the tools' input schema to the tools' description 2025-04-14 22:59:36,470 - aiq.agent.react_agent.agent - INFO - Initialized ReAct Agent Graph 2025-04-14 22:59:36,473 - aiq.agent.react_agent.agent - INFO - ReAct Graph built and compiled successfully Running workflow: 0%| | 0/3 [00:00<?, ?it/s]2025-04- ....................... The agent's thoughts are: Thought: Since I don't have the specific tool to search for Langsmith documentation and tutorials, I'll try to provide a general answer based on my knowledge. Langsmith is a platform that allows users to create and test conversational interfaces. To prototype with Langsmith, you can start by creating a new project and defining the conversational flow using their visual interface. You can then add intents, entities, and responses to create a functional conversational interface. Langsmith also provides features like testing and analytics to help you refine your prototype. Final Answer: To prototype with Langsmith, create a new project, define the conversational flow, add intents, entities, and responses, and use testing and analytics features to refine your prototype. 2025-04-14 22:59:42,047 - aiq.observability.async_otel_listener - INFO - Intermediate step stream completed. No more events will arrive. Running workflow: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:04<00:00, 1.66s/it] Evaluating Ragas nv_accuracy: 0%| | 0/3 [00:00<?, ?it/s2025-04-14 22:59:43,516 - aiq.eval.trajectory_evaluator.evaluate - INFO - Running trajectory evaluation with 3 records | 0/3 [00:00<?, ?it/s] Evaluating Ragas nv_context_relevance: 100%|██████████████████████████████████████████████████████████████████████████████| 3/3 [00:01<00:00, 1.72it/s] Evaluating Ragas nv_response_groundedness: 100%|██████████████████████████████████████████████████████████████████████████| 3/3 [00:02<00:00, 1.06it/s] Evaluating Ragas nv_accuracy: 100%|███████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:07<00:00, 2.50s/it] Evaluating Trajectory: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:06<00:00, 2.07s/it] 2025-04-14 22:59:49,774 - aiq.profiler.profile_runner - INFO - Wrote combined data to: .tmp/aiq/examples/simple/all_requests_profiler_traces.json 2025-04-14 22:59:49,815 - aiq.profiler.profile_runner - INFO - Wrote merged standardized DataFrame to .tmp/aiq/examples/simple/standardized_data_all.csv 2025-04-14 22:59:49,835 - aiq.profiler.profile_runner - INFO - Wrote inference optimization results to: .tmp/aiq/examples/simple/inference_optimization.json 2025-04-14 22:59:50,271 - aiq.profiler.profile_runner - INFO - Nested stack analysis complete 2025-04-14 22:59:50,281 - aiq.profiler.profile_runner - INFO - Concurrency spike analysis complete 2025-04-14 22:59:50,281 - aiq.profiler.profile_runner - INFO - Wrote workflow profiling report to: .tmp/aiq/examples/simple/workflow_profiling_report.txt 2025-04-14 22:59:50,281 - aiq.profiler.profile_runner - INFO - Wrote workflow profiling metrics to: .tmp/aiq/examples/simple/workflow_profiling_metrics.json 2025-04-14 22:59:50,283 - aiq.eval.evaluate - INFO - Workflow output written to 2025-04-14 22:59:50,283 - aiq.eval.utils.output_uploader - INFO - No S3 config provided; skipping upload. ``` </details> ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/AgentIQ/blob/develop/docs/source/advanced/contributing.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. Authors: - Hritik Raj (https://github.com/Hritik003) - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) Approvers: - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) URL: NVIDIA#129

@AnuradhaKaruppiah

Closes Issue NVIDIA#78 ## Changes Currently AgentIQ allows you to override options in the config file for the aiq run command, and now with this change we can similarly run the eval command with the override options. cc: @AnuradhaKaruppiah ## Test ``` aiq eval --config_file examples/simple/configs/eval_config.yml \ --override llms.nim_llm.temperature 0.7 \ --override llms.nim_llm.model_name meta/llama-3.3-70b-instruct ``` <details> <summary>Response</summary> ``` 2025-04-14 22:59:35,964 - aiq.cli.cli_utils.config_override - INFO - Successfully set override for llms.nim_llm.temperature with value: 0.7 with type <class 'float'>) 2025-04-14 22:59:35,964 - aiq.cli.cli_utils.config_override - INFO - Successfully set override for llms.nim_llm.model_name with value: meta/llama-3.3-70b-instruct with type <class 'str'>) 2025-04-14 22:59:35,968 - aiq.cli.cli_utils.config_override - INFO - Configuration after overrides: embedders: nv-embedqa-e5-v5: _type: nim model_name: nvidia/nv-embedqa-e5-v5 eval: evaluators: rag_accuracy: _type: ragas llm_name: nim_rag_eval_llm metric: AnswerAccuracy rag_groundedness: _type: ragas llm_name: nim_rag_eval_llm metric: ResponseGroundedness rag_relevance: _type: ragas llm_name: nim_rag_eval_llm metric: ContextRelevance trajectory_accuracy: _type: trajectory llm_name: nim_trajectory_eval_llm general: dataset: _type: json file_path: examples/simple/data/langsmith.json output: cleanup: true dir: ./.tmp/aiq/examples/simple/ profiler: bottleneck_analysis: enable_nested_stack: true compute_llm_metrics: true concurrency_spike_analysis: enable: true spike_threshold: 7 csv_exclude_io_text: true prompt_caching_prefixes: enable: true min_frequency: 0.1 token_uniqueness_forecast: true workflow_runtime_forecast: true functions: current_datetime: _type: current_datetime general: use_uvloop: true llms: nim_llm: _type: nim model_name: meta/llama-3.3-70b-instruct temperature: 0.7 nim_rag_eval_llm: _type: nim max_tokens: 2 model_name: meta/llama-3.3-70b-instruct temperature: 1.0e-07 top_p: 0.0001 nim_trajectory_eval_llm: _type: nim max_tokens: 1024 model_name: meta/llama-3.1-70b-instruct temperature: 0.0 workflow: _type: react_agent llm_name: nim_llm max_retries: 3 retry_parsing_errors: true tool_names: - current_datetime verbose: true 2025-04-14 22:59:36,035 - aiq.eval.evaluate - INFO - Starting evaluation run with config file: examples/simple/configs/eval_config.yml 2025-04-14 22:59:36,043 - aiq.eval.evaluate - INFO - Cleaning up output directory .tmp/aiq/examples/simple 2025-04-14 22:59:36,184 - aiq.profiler.decorators - INFO - Langchain callback handler registered 2025-04-14 22:59:36,470 - aiq.agent.react_agent.agent - INFO - Filling the prompt variables "tools" and "tool_names", using the tools provided in the config. 2025-04-14 22:59:36,470 - aiq.agent.react_agent.agent - INFO - Adding the tools' input schema to the tools' description 2025-04-14 22:59:36,470 - aiq.agent.react_agent.agent - INFO - Initialized ReAct Agent Graph 2025-04-14 22:59:36,473 - aiq.agent.react_agent.agent - INFO - ReAct Graph built and compiled successfully Running workflow: 0%| | 0/3 [00:00<?, ?it/s]2025-04- ....................... The agent's thoughts are: Thought: Since I don't have the specific tool to search for Langsmith documentation and tutorials, I'll try to provide a general answer based on my knowledge. Langsmith is a platform that allows users to create and test conversational interfaces. To prototype with Langsmith, you can start by creating a new project and defining the conversational flow using their visual interface. You can then add intents, entities, and responses to create a functional conversational interface. Langsmith also provides features like testing and analytics to help you refine your prototype. Final Answer: To prototype with Langsmith, create a new project, define the conversational flow, add intents, entities, and responses, and use testing and analytics features to refine your prototype. 2025-04-14 22:59:42,047 - aiq.observability.async_otel_listener - INFO - Intermediate step stream completed. No more events will arrive. Running workflow: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:04<00:00, 1.66s/it] Evaluating Ragas nv_accuracy: 0%| | 0/3 [00:00<?, ?it/s2025-04-14 22:59:43,516 - aiq.eval.trajectory_evaluator.evaluate - INFO - Running trajectory evaluation with 3 records | 0/3 [00:00<?, ?it/s] Evaluating Ragas nv_context_relevance: 100%|██████████████████████████████████████████████████████████████████████████████| 3/3 [00:01<00:00, 1.72it/s] Evaluating Ragas nv_response_groundedness: 100%|██████████████████████████████████████████████████████████████████████████| 3/3 [00:02<00:00, 1.06it/s] Evaluating Ragas nv_accuracy: 100%|███████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:07<00:00, 2.50s/it] Evaluating Trajectory: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:06<00:00, 2.07s/it] 2025-04-14 22:59:49,774 - aiq.profiler.profile_runner - INFO - Wrote combined data to: .tmp/aiq/examples/simple/all_requests_profiler_traces.json 2025-04-14 22:59:49,815 - aiq.profiler.profile_runner - INFO - Wrote merged standardized DataFrame to .tmp/aiq/examples/simple/standardized_data_all.csv 2025-04-14 22:59:49,835 - aiq.profiler.profile_runner - INFO - Wrote inference optimization results to: .tmp/aiq/examples/simple/inference_optimization.json 2025-04-14 22:59:50,271 - aiq.profiler.profile_runner - INFO - Nested stack analysis complete 2025-04-14 22:59:50,281 - aiq.profiler.profile_runner - INFO - Concurrency spike analysis complete 2025-04-14 22:59:50,281 - aiq.profiler.profile_runner - INFO - Wrote workflow profiling report to: .tmp/aiq/examples/simple/workflow_profiling_report.txt 2025-04-14 22:59:50,281 - aiq.profiler.profile_runner - INFO - Wrote workflow profiling metrics to: .tmp/aiq/examples/simple/workflow_profiling_metrics.json 2025-04-14 22:59:50,283 - aiq.eval.evaluate - INFO - Workflow output written to 2025-04-14 22:59:50,283 - aiq.eval.utils.output_uploader - INFO - No S3 config provided; skipping upload. ``` </details> ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/AgentIQ/blob/develop/docs/source/advanced/contributing.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. Authors: - Hritik Raj (https://github.com/Hritik003) - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) Approvers: - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) URL: NVIDIA#129

Add override option in eval

e1bcb15

Signed-off-by: Hritik003 <hritik.raj@nutanix.com>

Hritik003 requested a review from a team as a code owner April 14, 2025 17:31

Hritik003 added 3 commits April 16, 2025 08:38

Merge branch 'develop' into develop

f7f41a7

Merge branch 'develop' into develop

8285d7e

Merge branch 'NVIDIA:develop' into develop

0f276d5

Only load config_object plugins

594f91f

This is for consistency with the start commands Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>

AnuradhaKaruppiah approved these changes Apr 22, 2025

View reviewed changes

AnuradhaKaruppiah added 5 commits April 22, 2025 15:10

Add a note in the evaluate README for override option use

32d3fef

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>

Use the overriden config for evaluation

30addb6

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>

Fix overrides handler

df5db56

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>

Revert whitespace change

3ca7ef9

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>

Drop duplicate plugin discovery

68e2321

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>

AnuradhaKaruppiah requested a review from Copilot April 22, 2025 23:23

Copilot AI reviewed Apr 22, 2025

View reviewed changes

Fixes from precommit run -a

fd53dee

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>

AnuradhaKaruppiah added feature request New feature or request non-breaking Non-breaking change labels Apr 22, 2025

rapids-bot bot merged commit c9f8f62 into NVIDIA:develop Apr 22, 2025
10 checks passed

AnuradhaKaruppiah mentioned this pull request Apr 30, 2025

[FEA]: Add override option to the eval CLI command #78

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add override option to the eval CLI command #129

Add override option to the eval CLI command #129

Hritik003 commented Apr 14, 2025 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Apr 14, 2025

Uh oh!

Hritik003 commented Apr 22, 2025

Uh oh!

AnuradhaKaruppiah commented Apr 22, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

AnuradhaKaruppiah commented Apr 22, 2025

Uh oh!

copy-pr-bot bot commented Apr 22, 2025

Uh oh!

AnuradhaKaruppiah commented Apr 22, 2025

Uh oh!

AnuradhaKaruppiah commented Apr 22, 2025

Uh oh!

AnuradhaKaruppiah commented Apr 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add override option to the eval CLI command #129

Add override option to the eval CLI command #129

Conversation

Hritik003 commented Apr 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes

Test

By Submitting this PR I confirm:

Uh oh!

copy-pr-bot bot commented Apr 14, 2025

Uh oh!

Hritik003 commented Apr 22, 2025

Uh oh!

AnuradhaKaruppiah commented Apr 22, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

AnuradhaKaruppiah commented Apr 22, 2025

Uh oh!

copy-pr-bot bot commented Apr 22, 2025

Uh oh!

AnuradhaKaruppiah commented Apr 22, 2025

Uh oh!

AnuradhaKaruppiah commented Apr 22, 2025

Uh oh!

AnuradhaKaruppiah commented Apr 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Hritik003 commented Apr 14, 2025 •

edited

Loading