KEMBAR78
Add Job ID Appending to Output Directories and Maximum Folders Threshold by ZhongxuanWang · Pull Request #331 · NVIDIA/NeMo-Agent-Toolkit · GitHub
Skip to content

Conversation

@ZhongxuanWang
Copy link
Contributor

Description

Closes #325

This PR adds a new feature to append unique job IDs to output directories in the evaluation configuration. This helps prevent overwriting results across multiple runs by ensuring each run has a unique output directory.

  1. Added new configuration option append_job_id_to_output_dir to EvalOutputConfig
  2. Implemented job ID generation logic:
    • Generates UUID when feature is enabled and no job ID provided
    • Uses custom job ID when provided
    • No job ID generation when feature is disabled
  3. Added comprehensive test coverage for all scenarios
  4. Added documentation for the new feature

This PR is also backward compatible by making the append_job_id_to_output_dir optional.

By Submitting this PR I confirm:

  • I am familiar with the Contributing Guidelines.
  • We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
    • Any contribution which contains commits that are not Signed-Off will not be accepted.
  • When the PR is ready for review, new or existing tests cover these changes.
  • When the PR is ready for review, the documentation is up to date with these changes.

Signed-off-by: zhongxuanwang <cndanielwang@gmail.com>
Signed-off-by: zhongxuanwang <cndanielwang@gmail.com>
Signed-off-by: zhongxuanwang <cndanielwang@gmail.com>
Signed-off-by: zhongxuanwang <cndanielwang@gmail.com>
@copy-pr-bot
Copy link

copy-pr-bot bot commented May 31, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@ZhongxuanWang
Copy link
Contributor Author

@AnuradhaKaruppiah Hi Ms. Karuppiah, could you review my first PR when you get a chance? Thank you!

@ZhongxuanWang ZhongxuanWang force-pushed the develop branch 2 times, most recently from 3a280fe to 7324ff9 Compare June 1, 2025 21:57
Signed-off-by: zhongxuanwang <cndanielwang@gmail.com>
@AnuradhaKaruppiah AnuradhaKaruppiah added improvement Improvement to existing functionality non-breaking Non-breaking change labels Jun 10, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds functionality to append unique job IDs to output directories to avoid overwriting results across runs.

  • Introduces append_job_id_to_output_dir in the evaluation output config.
  • Implements UUID-based job ID generation in run_and_evaluate.
  • Updates tests and documentation to cover and describe the new feature.

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
tests/aiq/eval/test_evaluate.py Adds unit tests that simulate job ID generation logic
src/aiq/eval/evaluate.py Inserts logic to generate and log a job ID when enabled
src/aiq/data_models/evaluate.py Defines the new append_job_id_to_output_dir field
docs/source/reference/evaluate.md Documents usage of the new output-dir option
Comments suppressed due to low confidence (1)

docs/source/reference/evaluate.md:455

  • The documentation uses JOB_{UUID} (uppercase) but the implementation prefixes with job_ (lowercase). Update the example to match the actual lowercase prefix for consistency.
When `append_job_id_to_output_dir` is set to `true`, a unique job ID (`JOB_{UUID}`) is automatically generated for each evaluation run and appended to the output directory path.

Copy link
Contributor

@AnuradhaKaruppiah AnuradhaKaruppiah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @ZhongxuanWang

ZhongxuanWang and others added 5 commits June 10, 2025 11:12
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Zhongxuan (Daniel) Wang <cndanielwang@gmail.com>
Signed-off-by: zhongxuanwang <cndanielwang@gmail.com>
Signed-off-by: zhongxuanwang <cndanielwang@gmail.com>
Signed-off-by: zhongxuanwang <cndanielwang@gmail.com>
@ZhongxuanWang
Copy link
Contributor Author

ZhongxuanWang commented Jun 11, 2025

In the new commits, I just made the two new parameters for EvalOutputConfig, namely

    # Maximum number of jobs to keep in the output directory. Oldest jobs will be evicted.
    # A value of 0 or None means no limit.
    max_jobs: int | None = None
    # Policy for evicting old jobs. Defaults to using time_created.
    eviction_policy: JobEvictionPolicy = JobEvictionPolicy.TIME_CREATED # or JobEvictionPolicy.TIME_MODIFIED

I also added corresponding logics in evaluate.py

Now, the uses can choose whether the output folder removal policy should be either based on the time the folders are created (TIME_CREATED) or the folders are modified (TIME_MODIFIED).

I also made corresponding unit tests. I also made the parameters to have a default value so they will be backward compatible.

@ZhongxuanWang ZhongxuanWang changed the title Add Job ID Appending to Output Directories Add Job ID Appending to Output Directories and Maximum Folders Threshold Jun 11, 2025
Signed-off-by: zhongxuanwang <cndanielwang@gmail.com>
@ZhongxuanWang
Copy link
Contributor Author

ZhongxuanWang commented Jun 13, 2025

I accidentally missed changing the documentation during adding my 2nd feature. Now the documentation is updated with the two new parameters max_jobs and eviction_policy

@AnuradhaKaruppiah
Copy link
Contributor

/ok to test d78530c

@AnuradhaKaruppiah
Copy link
Contributor

/ok to test 5deaf25

This commit:
1. Adds a job_management subsecion to group job config
2. Fixes a problem with the parent directory searched for eviction
3. Updates docs to reflect these changes

Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
@AnuradhaKaruppiah
Copy link
Contributor

/ok to test 4ce53c6

@AnuradhaKaruppiah
Copy link
Contributor

/merge

@rapids-bot rapids-bot bot merged commit 9287c5c into NVIDIA:develop Jun 17, 2025
12 checks passed
@AnuradhaKaruppiah
Copy link
Contributor

Merged, thx for the contribution @ZhongxuanWang

AnuradhaKaruppiah pushed a commit to AnuradhaKaruppiah/oss-agentiq that referenced this pull request Aug 4, 2025
…old (NVIDIA#331)

Closes NVIDIA#325 

This PR adds a new feature to append unique job IDs to output directories in the evaluation configuration. This helps prevent overwriting results across multiple runs by ensuring each run has a unique output directory.

1. Added new configuration option `append_job_id_to_output_dir` to `EvalOutputConfig`
2. Implemented job ID generation logic:
   - Generates UUID when feature is enabled and no job ID provided
   - Uses custom job ID when provided
   - No job ID generation when feature is disabled
3. Added comprehensive test coverage for all scenarios
4. Added documentation for the new feature

This PR is also backward compatible by making the `append_job_id_to_output_dir` optional.


## By Submitting this PR I confirm:
- I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/AIQToolkit/blob/develop/docs/source/resources/contributing.md).
- We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
  - Any contribution which contains commits that are not Signed-Off will not be accepted.
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.

Authors:
  - Zhongxuan (Daniel) Wang (https://github.com/ZhongxuanWang)
  - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah)

Approvers:
  - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah)

URL: NVIDIA#331
scheckerNV pushed a commit to scheckerNV/aiq-factory-reset that referenced this pull request Aug 22, 2025
…old (NVIDIA#331)

Closes NVIDIA#325 

This PR adds a new feature to append unique job IDs to output directories in the evaluation configuration. This helps prevent overwriting results across multiple runs by ensuring each run has a unique output directory.

1. Added new configuration option `append_job_id_to_output_dir` to `EvalOutputConfig`
2. Implemented job ID generation logic:
   - Generates UUID when feature is enabled and no job ID provided
   - Uses custom job ID when provided
   - No job ID generation when feature is disabled
3. Added comprehensive test coverage for all scenarios
4. Added documentation for the new feature

This PR is also backward compatible by making the `append_job_id_to_output_dir` optional.


## By Submitting this PR I confirm:
- I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/AIQToolkit/blob/develop/docs/source/resources/contributing.md).
- We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
  - Any contribution which contains commits that are not Signed-Off will not be accepted.
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.

Authors:
  - Zhongxuan (Daniel) Wang (https://github.com/ZhongxuanWang)
  - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah)

Approvers:
  - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah)

URL: NVIDIA#331
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improvement to existing functionality non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEA]: Evaluation overwrites the files in the output directory

2 participants