KEMBAR78
Add an async version of the /generate endpoint by dagardner-nv · Pull Request #315 · NVIDIA/NeMo-Agent-Toolkit · GitHub
Skip to content

Conversation

@dagardner-nv
Copy link
Contributor

@dagardner-nv dagardner-nv commented May 23, 2025

Description

  • Adds two new endpoints:

    • /generate/async
    • /generate/async/job/{job_id}
  • Adds a --max_running_async_jobs flag to aiq serve to control the number of concurrent async generate jobs.

  • Similar to the evaluate API the /generate/async endpoint optionally receives a user-assigned job_id (otherwise one is generated), and an expiry_seconds (defaults to an hour).

  • Since the payload body of the /generate endpoint (GenerateBodyType) is determined dynamically, so too is the payload for the async endpoint, however this makes job_id, sync_timeout, and expiry_seconds effectively reserved fields.

Usage example:
Terminal 1:

aiq serve --config_file=examples/simple/configs/config.yml

Terminal 2:

curl -X 'POST' \
  'http://localhost:8000/generate/async' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{"input_message": "What time is it?", "job_id": "1"}' | jq

curl --request GET --url http://localhost:8000/generate/async/job/1

Open Questions

  1. Do we need a GET variant for the /generate/async endpoint? I see this for other endpoints, but it doesn't really make sense
  2. Do we need an OpenAI variant? If so it would be enabled with "background": true in the payload body, however we currently don't enable stream in this way.
  3. Currently async evaluations are limited to one concurrent job, should this be updated to use the max_running_async_jobs config value?
  4. Is issue [BUG]: evaluate API returns inconsistent results when workers > 1 #313 outside the scope of this PR, or a requirement? I'm currently assuming that it is outside the scope.
  5. There is currently no mechanism for canceling long running jobs which may be stuck in an endless loop. I held off on adding this as it started feeling like we were going out of scope of AIQ, and that maybe rather than using Starlette background tasks we should use something like Dask or Celery.

Closes #223

By Submitting this PR I confirm:

  • I am familiar with the Contributing Guidelines.
  • We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
    • Any contribution which contains commits that are not Signed-Off will not be accepted.
  • When the PR is ready for review, new or existing tests cover these changes.
  • When the PR is ready for review, the documentation is up to date with these changes.

Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
…g dir, instead simply require a yaml file extension

Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
…nc-generate-223

Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
@dagardner-nv dagardner-nv self-assigned this May 23, 2025
@dagardner-nv dagardner-nv added feature request New feature or request skip-ci Optionally Skip CI for this PR non-breaking Non-breaking change labels May 23, 2025
@dagardner-nv dagardner-nv marked this pull request as draft May 23, 2025 22:51
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
…nc-generate-223

Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
…nc-generate-223

Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
@dagardner-nv dagardner-nv removed the skip-ci Optionally Skip CI for this PR label May 28, 2025
@dagardner-nv dagardner-nv marked this pull request as ready for review May 28, 2025 16:24
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
…nc-generate-223

Signed-off-by: David Gardner <dagardner@nvidia.com>
…nc-generate-223

Signed-off-by: David Gardner <dagardner@nvidia.com>
Copy link
Contributor

@AnuradhaKaruppiah AnuradhaKaruppiah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@AnuradhaKaruppiah
Copy link
Contributor

AnuradhaKaruppiah commented Jun 2, 2025

Open Questions

  1. Currently async evaluations are limited to one concurrent job, should this be updated to use the max_running_async_jobs config value?

This should be left as one, evaluation runs the workflow and evaluators concurrently and that is managed via a max_concurrency config in the yaml file. Adding another layer of concurrency on top of it may make it harder to predict the resource requirements.

For 4 & 5, I agree. It should be addressed separately once the design is finalized.

Co-authored-by: Anuradha Karuppiah <anuradha.karuppiah@gmail.com>
Signed-off-by: David Gardner <96306125+dagardner-nv@users.noreply.github.com>
@dagardner-nv
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 70f3439 into NVIDIA:develop Jun 3, 2025
12 checks passed
@dagardner-nv dagardner-nv deleted the david-async-generate-223 branch June 3, 2025 16:56
ericevans-nv pushed a commit to ericevans-nv/agent-iq that referenced this pull request Jun 3, 2025
* Adds two new endpoints:
  - `/generate/async`
  - `/generate/async/job/{job_id}`

* Adds a `--max_running_async_jobs` flag to `aiq serve` to control the number of concurrent async generate jobs.

* Similar to the evaluate API the `/generate/async` endpoint optionally receives a user-assigned `job_id` (otherwise one is generated), and an `expiry_seconds` (defaults to an hour).

* Since the payload body of the `/generate` endpoint (`GenerateBodyType`) is determined dynamically, so too is the payload for the async endpoint, however this makes `job_id`, `sync_timeout`, and `expiry_seconds` effectively *reserved* fields.

Usage example:
Terminal 1:
```bash
aiq serve --config_file=examples/simple/configs/config.yml
```

Terminal 2:
```bash
curl -X 'POST' \
  'http://localhost:8000/generate/async' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{"input_message": "What time is it?", "job_id": "1"}' | jq

curl --request GET --url http://localhost:8000/generate/async/job/1
```

1) Do we need a GET variant for the `/generate/async` endpoint? I see this for other endpoints, but it doesn't really make sense
2) Do we need an OpenAI variant? If so it would be enabled with `"background": true` in the payload body, however we currently don't enable stream in this way.
3) Currently async evaluations are limited to one concurrent job, should this be updated to use the `max_running_async_jobs` config value?
4) Is issue NVIDIA#313 outside the scope of this PR, or a requirement? I'm currently assuming that it is outside the scope.
5) There is currently no mechanism for canceling long running jobs which may be stuck in an endless loop. I held off on adding this as it started feeling like we were going out of scope of AIQ, and that maybe rather than using Starlette background tasks we should use something like Dask or Celery.

Closes NVIDIA#223

- I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/AIQToolkit/blob/develop/docs/source/resources/contributing.md).
- We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
  - Any contribution which contains commits that are not Signed-Off will not be accepted.
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.

Authors:
  - David Gardner (https://github.com/dagardner-nv)

Approvers:
  - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah)

URL: NVIDIA#315
ericevans-nv pushed a commit to ericevans-nv/agent-iq that referenced this pull request Jun 3, 2025
* Adds two new endpoints:
  - `/generate/async`
  - `/generate/async/job/{job_id}`

* Adds a `--max_running_async_jobs` flag to `aiq serve` to control the number of concurrent async generate jobs.

* Similar to the evaluate API the `/generate/async` endpoint optionally receives a user-assigned `job_id` (otherwise one is generated), and an `expiry_seconds` (defaults to an hour).

* Since the payload body of the `/generate` endpoint (`GenerateBodyType`) is determined dynamically, so too is the payload for the async endpoint, however this makes `job_id`, `sync_timeout`, and `expiry_seconds` effectively *reserved* fields.

Usage example:
Terminal 1:
```bash
aiq serve --config_file=examples/simple/configs/config.yml
```

Terminal 2:
```bash
curl -X 'POST' \
  'http://localhost:8000/generate/async' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{"input_message": "What time is it?", "job_id": "1"}' | jq

curl --request GET --url http://localhost:8000/generate/async/job/1
```

1) Do we need a GET variant for the `/generate/async` endpoint? I see this for other endpoints, but it doesn't really make sense
2) Do we need an OpenAI variant? If so it would be enabled with `"background": true` in the payload body, however we currently don't enable stream in this way.
3) Currently async evaluations are limited to one concurrent job, should this be updated to use the `max_running_async_jobs` config value?
4) Is issue NVIDIA#313 outside the scope of this PR, or a requirement? I'm currently assuming that it is outside the scope.
5) There is currently no mechanism for canceling long running jobs which may be stuck in an endless loop. I held off on adding this as it started feeling like we were going out of scope of AIQ, and that maybe rather than using Starlette background tasks we should use something like Dask or Celery.

Closes NVIDIA#223

- I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/AIQToolkit/blob/develop/docs/source/resources/contributing.md).
- We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
  - Any contribution which contains commits that are not Signed-Off will not be accepted.
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.

Authors:
  - David Gardner (https://github.com/dagardner-nv)

Approvers:
  - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah)

URL: NVIDIA#315
Signed-off-by: Eric Evans <194135482+ericevans-nv@users.noreply.github.com>
ericevans-nv pushed a commit to ericevans-nv/agent-iq that referenced this pull request Jun 3, 2025
* Adds two new endpoints:
  - `/generate/async`
  - `/generate/async/job/{job_id}`

* Adds a `--max_running_async_jobs` flag to `aiq serve` to control the number of concurrent async generate jobs.

* Similar to the evaluate API the `/generate/async` endpoint optionally receives a user-assigned `job_id` (otherwise one is generated), and an `expiry_seconds` (defaults to an hour).

* Since the payload body of the `/generate` endpoint (`GenerateBodyType`) is determined dynamically, so too is the payload for the async endpoint, however this makes `job_id`, `sync_timeout`, and `expiry_seconds` effectively *reserved* fields.

Usage example:
Terminal 1:
```bash
aiq serve --config_file=examples/simple/configs/config.yml
```

Terminal 2:
```bash
curl -X 'POST' \
  'http://localhost:8000/generate/async' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{"input_message": "What time is it?", "job_id": "1"}' | jq

curl --request GET --url http://localhost:8000/generate/async/job/1
```

1) Do we need a GET variant for the `/generate/async` endpoint? I see this for other endpoints, but it doesn't really make sense
2) Do we need an OpenAI variant? If so it would be enabled with `"background": true` in the payload body, however we currently don't enable stream in this way.
3) Currently async evaluations are limited to one concurrent job, should this be updated to use the `max_running_async_jobs` config value?
4) Is issue NVIDIA#313 outside the scope of this PR, or a requirement? I'm currently assuming that it is outside the scope.
5) There is currently no mechanism for canceling long running jobs which may be stuck in an endless loop. I held off on adding this as it started feeling like we were going out of scope of AIQ, and that maybe rather than using Starlette background tasks we should use something like Dask or Celery.

Closes NVIDIA#223

- I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/AIQToolkit/blob/develop/docs/source/resources/contributing.md).
- We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
  - Any contribution which contains commits that are not Signed-Off will not be accepted.
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.

Authors:
  - David Gardner (https://github.com/dagardner-nv)

Approvers:
  - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah)

URL: NVIDIA#315
Signed-off-by: Eric Evans <194135482+ericevans-nv@users.noreply.github.com>
AnuradhaKaruppiah pushed a commit to AnuradhaKaruppiah/oss-agentiq that referenced this pull request Aug 4, 2025
* Adds two new endpoints:
  - `/generate/async`
  - `/generate/async/job/{job_id}`

* Adds a `--max_running_async_jobs` flag to `aiq serve` to control the number of concurrent async generate jobs.

* Similar to the evaluate API the `/generate/async` endpoint optionally receives a user-assigned `job_id` (otherwise one is generated), and an `expiry_seconds` (defaults to an hour).

* Since the payload body of the `/generate` endpoint (`GenerateBodyType`) is determined dynamically, so too is the payload for the async endpoint, however this makes `job_id`, `sync_timeout`, and `expiry_seconds` effectively *reserved* fields.

Usage example:
Terminal 1:
```bash
aiq serve --config_file=examples/simple/configs/config.yml
```

Terminal 2:
```bash
curl -X 'POST' \
  'http://localhost:8000/generate/async' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{"input_message": "What time is it?", "job_id": "1"}' | jq

curl --request GET --url http://localhost:8000/generate/async/job/1
```

#### Open Questions
1) Do we need a GET variant for the `/generate/async` endpoint? I see this for other endpoints, but it doesn't really make sense
2) Do we need an OpenAI variant? If so it would be enabled with `"background": true` in the payload body, however we currently don't enable stream in this way.
3) Currently async evaluations are limited to one concurrent job, should this be updated to use the `max_running_async_jobs` config value?
4) Is issue NVIDIA#313 outside the scope of this PR, or a requirement? I'm currently assuming that it is outside the scope.
5) There is currently no mechanism for canceling long running jobs which may be stuck in an endless loop. I held off on adding this as it started feeling like we were going out of scope of AIQ, and that maybe rather than using Starlette background tasks we should use something like Dask or Celery.

Closes NVIDIA#223

## By Submitting this PR I confirm:
- I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/AIQToolkit/blob/develop/docs/source/resources/contributing.md).
- We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
  - Any contribution which contains commits that are not Signed-Off will not be accepted.
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.

Authors:
  - David Gardner (https://github.com/dagardner-nv)

Approvers:
  - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah)

URL: NVIDIA#315
scheckerNV pushed a commit to scheckerNV/aiq-factory-reset that referenced this pull request Aug 22, 2025
* Adds two new endpoints:
  - `/generate/async`
  - `/generate/async/job/{job_id}`

* Adds a `--max_running_async_jobs` flag to `aiq serve` to control the number of concurrent async generate jobs.

* Similar to the evaluate API the `/generate/async` endpoint optionally receives a user-assigned `job_id` (otherwise one is generated), and an `expiry_seconds` (defaults to an hour).

* Since the payload body of the `/generate` endpoint (`GenerateBodyType`) is determined dynamically, so too is the payload for the async endpoint, however this makes `job_id`, `sync_timeout`, and `expiry_seconds` effectively *reserved* fields.

Usage example:
Terminal 1:
```bash
aiq serve --config_file=examples/simple/configs/config.yml
```

Terminal 2:
```bash
curl -X 'POST' \
  'http://localhost:8000/generate/async' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{"input_message": "What time is it?", "job_id": "1"}' | jq

curl --request GET --url http://localhost:8000/generate/async/job/1
```

#### Open Questions
1) Do we need a GET variant for the `/generate/async` endpoint? I see this for other endpoints, but it doesn't really make sense
2) Do we need an OpenAI variant? If so it would be enabled with `"background": true` in the payload body, however we currently don't enable stream in this way.
3) Currently async evaluations are limited to one concurrent job, should this be updated to use the `max_running_async_jobs` config value?
4) Is issue NVIDIA#313 outside the scope of this PR, or a requirement? I'm currently assuming that it is outside the scope.
5) There is currently no mechanism for canceling long running jobs which may be stuck in an endless loop. I held off on adding this as it started feeling like we were going out of scope of AIQ, and that maybe rather than using Starlette background tasks we should use something like Dask or Celery.

Closes NVIDIA#223

## By Submitting this PR I confirm:
- I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/AIQToolkit/blob/develop/docs/source/resources/contributing.md).
- We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
  - Any contribution which contains commits that are not Signed-Off will not be accepted.
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.

Authors:
  - David Gardner (https://github.com/dagardner-nv)

Approvers:
  - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah)

URL: NVIDIA#315
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature request New feature or request non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEA]: Provide an async version of the /generate endpoint

2 participants