Add an async version of the /generate endpoint #315

dagardner-nv · 2025-05-23T22:51:32Z

Description

Adds two new endpoints:
- /generate/async
- /generate/async/job/{job_id}
Adds a --max_running_async_jobs flag to aiq serve to control the number of concurrent async generate jobs.
Similar to the evaluate API the /generate/async endpoint optionally receives a user-assigned job_id (otherwise one is generated), and an expiry_seconds (defaults to an hour).
Since the payload body of the /generate endpoint (GenerateBodyType) is determined dynamically, so too is the payload for the async endpoint, however this makes job_id, sync_timeout, and expiry_seconds effectively reserved fields.

Usage example:
Terminal 1:

aiq serve --config_file=examples/simple/configs/config.yml

Terminal 2:

curl -X 'POST' \
  'http://localhost:8000/generate/async' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{"input_message": "What time is it?", "job_id": "1"}' | jq

curl --request GET --url http://localhost:8000/generate/async/job/1

Open Questions

Do we need a GET variant for the /generate/async endpoint? I see this for other endpoints, but it doesn't really make sense
Do we need an OpenAI variant? If so it would be enabled with "background": true in the payload body, however we currently don't enable stream in this way.
Currently async evaluations are limited to one concurrent job, should this be updated to use the max_running_async_jobs config value?
Is issue [BUG]: evaluate API returns inconsistent results when workers > 1 #313 outside the scope of this PR, or a requirement? I'm currently assuming that it is outside the scope.
There is currently no mechanism for canceling long running jobs which may be stuck in an endless loop. I held off on adding this as it started feeling like we were going out of scope of AIQ, and that maybe rather than using Starlette background tasks we should use something like Dask or Celery.

Closes #223

By Submitting this PR I confirm:

I am familiar with the Contributing Guidelines.
We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
- Any contribution which contains commits that are not Signed-Off will not be accepted.
When the PR is ready for review, new or existing tests cover these changes.
When the PR is ready for review, the documentation is up to date with these changes.

Signed-off-by: David Gardner <dagardner@nvidia.com>

…g dir, instead simply require a yaml file extension Signed-off-by: David Gardner <dagardner@nvidia.com>

Signed-off-by: David Gardner <dagardner@nvidia.com>

…nc-generate-223 Signed-off-by: David Gardner <dagardner@nvidia.com>

Signed-off-by: David Gardner <dagardner@nvidia.com>

…nc-generate-223 Signed-off-by: David Gardner <dagardner@nvidia.com>

Signed-off-by: David Gardner <dagardner@nvidia.com>

…ate-223

…nc-generate-223 Signed-off-by: David Gardner <dagardner@nvidia.com>

Signed-off-by: David Gardner <dagardner@nvidia.com>

…nc-generate-223 Signed-off-by: David Gardner <dagardner@nvidia.com>

AnuradhaKaruppiah

LGTM

src/aiq/front_ends/fastapi/fastapi_front_end_config.py

AnuradhaKaruppiah · 2025-06-02T23:12:39Z

Open Questions

Currently async evaluations are limited to one concurrent job, should this be updated to use the max_running_async_jobs config value?

This should be left as one, evaluation runs the workflow and evaluators concurrently and that is managed via a max_concurrency config in the yaml file. Adding another layer of concurrency on top of it may make it harder to predict the resource requirements.

For 4 & 5, I agree. It should be addressed separately once the design is finalized.

Co-authored-by: Anuradha Karuppiah <anuradha.karuppiah@gmail.com> Signed-off-by: David Gardner <96306125+dagardner-nv@users.noreply.github.com>

dagardner-nv · 2025-06-03T16:56:19Z

/merge

* Adds two new endpoints: - `/generate/async` - `/generate/async/job/{job_id}` * Adds a `--max_running_async_jobs` flag to `aiq serve` to control the number of concurrent async generate jobs. * Similar to the evaluate API the `/generate/async` endpoint optionally receives a user-assigned `job_id` (otherwise one is generated), and an `expiry_seconds` (defaults to an hour). * Since the payload body of the `/generate` endpoint (`GenerateBodyType`) is determined dynamically, so too is the payload for the async endpoint, however this makes `job_id`, `sync_timeout`, and `expiry_seconds` effectively *reserved* fields. Usage example: Terminal 1: ```bash aiq serve --config_file=examples/simple/configs/config.yml ``` Terminal 2: ```bash curl -X 'POST' \ 'http://localhost:8000/generate/async' \ -H 'accept: application/json' \ -H 'Content-Type: application/json' \ -d '{"input_message": "What time is it?", "job_id": "1"}' | jq curl --request GET --url http://localhost:8000/generate/async/job/1 ``` 1) Do we need a GET variant for the `/generate/async` endpoint? I see this for other endpoints, but it doesn't really make sense 2) Do we need an OpenAI variant? If so it would be enabled with `"background": true` in the payload body, however we currently don't enable stream in this way. 3) Currently async evaluations are limited to one concurrent job, should this be updated to use the `max_running_async_jobs` config value? 4) Is issue NVIDIA#313 outside the scope of this PR, or a requirement? I'm currently assuming that it is outside the scope. 5) There is currently no mechanism for canceling long running jobs which may be stuck in an endless loop. I held off on adding this as it started feeling like we were going out of scope of AIQ, and that maybe rather than using Starlette background tasks we should use something like Dask or Celery. Closes NVIDIA#223 - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/AIQToolkit/blob/develop/docs/source/resources/contributing.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. Authors: - David Gardner (https://github.com/dagardner-nv) Approvers: - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) URL: NVIDIA#315

* Adds two new endpoints: - `/generate/async` - `/generate/async/job/{job_id}` * Adds a `--max_running_async_jobs` flag to `aiq serve` to control the number of concurrent async generate jobs. * Similar to the evaluate API the `/generate/async` endpoint optionally receives a user-assigned `job_id` (otherwise one is generated), and an `expiry_seconds` (defaults to an hour). * Since the payload body of the `/generate` endpoint (`GenerateBodyType`) is determined dynamically, so too is the payload for the async endpoint, however this makes `job_id`, `sync_timeout`, and `expiry_seconds` effectively *reserved* fields. Usage example: Terminal 1: ```bash aiq serve --config_file=examples/simple/configs/config.yml ``` Terminal 2: ```bash curl -X 'POST' \ 'http://localhost:8000/generate/async' \ -H 'accept: application/json' \ -H 'Content-Type: application/json' \ -d '{"input_message": "What time is it?", "job_id": "1"}' | jq curl --request GET --url http://localhost:8000/generate/async/job/1 ``` 1) Do we need a GET variant for the `/generate/async` endpoint? I see this for other endpoints, but it doesn't really make sense 2) Do we need an OpenAI variant? If so it would be enabled with `"background": true` in the payload body, however we currently don't enable stream in this way. 3) Currently async evaluations are limited to one concurrent job, should this be updated to use the `max_running_async_jobs` config value? 4) Is issue NVIDIA#313 outside the scope of this PR, or a requirement? I'm currently assuming that it is outside the scope. 5) There is currently no mechanism for canceling long running jobs which may be stuck in an endless loop. I held off on adding this as it started feeling like we were going out of scope of AIQ, and that maybe rather than using Starlette background tasks we should use something like Dask or Celery. Closes NVIDIA#223 - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/AIQToolkit/blob/develop/docs/source/resources/contributing.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. Authors: - David Gardner (https://github.com/dagardner-nv) Approvers: - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) URL: NVIDIA#315 Signed-off-by: Eric Evans <194135482+ericevans-nv@users.noreply.github.com>

* Adds two new endpoints: - `/generate/async` - `/generate/async/job/{job_id}` * Adds a `--max_running_async_jobs` flag to `aiq serve` to control the number of concurrent async generate jobs. * Similar to the evaluate API the `/generate/async` endpoint optionally receives a user-assigned `job_id` (otherwise one is generated), and an `expiry_seconds` (defaults to an hour). * Since the payload body of the `/generate` endpoint (`GenerateBodyType`) is determined dynamically, so too is the payload for the async endpoint, however this makes `job_id`, `sync_timeout`, and `expiry_seconds` effectively *reserved* fields. Usage example: Terminal 1: ```bash aiq serve --config_file=examples/simple/configs/config.yml ``` Terminal 2: ```bash curl -X 'POST' \ 'http://localhost:8000/generate/async' \ -H 'accept: application/json' \ -H 'Content-Type: application/json' \ -d '{"input_message": "What time is it?", "job_id": "1"}' | jq curl --request GET --url http://localhost:8000/generate/async/job/1 ``` #### Open Questions 1) Do we need a GET variant for the `/generate/async` endpoint? I see this for other endpoints, but it doesn't really make sense 2) Do we need an OpenAI variant? If so it would be enabled with `"background": true` in the payload body, however we currently don't enable stream in this way. 3) Currently async evaluations are limited to one concurrent job, should this be updated to use the `max_running_async_jobs` config value? 4) Is issue NVIDIA#313 outside the scope of this PR, or a requirement? I'm currently assuming that it is outside the scope. 5) There is currently no mechanism for canceling long running jobs which may be stuck in an endless loop. I held off on adding this as it started feeling like we were going out of scope of AIQ, and that maybe rather than using Starlette background tasks we should use something like Dask or Celery. Closes NVIDIA#223 ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/AIQToolkit/blob/develop/docs/source/resources/contributing.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. Authors: - David Gardner (https://github.com/dagardner-nv) Approvers: - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) URL: NVIDIA#315

dagardner-nv added 23 commits May 22, 2025 11:33

Validator for the job_id field

f300225

Signed-off-by: David Gardner <dagardner@nvidia.com>

Validator for the config_path field

0eca2df

Signed-off-by: David Gardner <dagardner@nvidia.com>

Lint fix

c0d8055

Signed-off-by: David Gardner <dagardner@nvidia.com>

simple integer validations:

e6a7e64

Signed-off-by: David Gardner <dagardner@nvidia.com>

Fix tests

d9ebf51

Signed-off-by: David Gardner <dagardner@nvidia.com>

Add new tests

1bebb90

Signed-off-by: David Gardner <dagardner@nvidia.com>

WIP: just copy/pasting from evaluation endpoint

7b19f1a

Signed-off-by: David Gardner <dagardner@nvidia.com>

Remove out of date comment

020fb89

Signed-off-by: David Gardner <dagardner@nvidia.com>

Add an optional field

48dcdfd

Signed-off-by: David Gardner <dagardner@nvidia.com>

New config classes

6a1dcbc

Signed-off-by: David Gardner <dagardner@nvidia.com>

Replace requiring the config file be located under the current workin…

f4ad761

…g dir, instead simply require a yaml file extension Signed-off-by: David Gardner <dagardner@nvidia.com>

Update tests

534537d

Signed-off-by: David Gardner <dagardner@nvidia.com>

WIP

50eb990

Signed-off-by: David Gardner <dagardner@nvidia.com>

Refactor the create_cleanup_task method to be shared

a6272dc

Signed-off-by: David Gardner <dagardner@nvidia.com>

WIP

8583ca5

Signed-off-by: David Gardner <dagardner@nvidia.com>

Merge branch 'david-val-job-id' into david-async-generate-223

e5bc660

Merge branch 'develop' of github.com:NVIDIA/AIQToolkit into david-asy…

bc689ab

…nc-generate-223 Signed-off-by: David Gardner <dagardner@nvidia.com>

Syntax fix

8fcc1ca

Signed-off-by: David Gardner <dagardner@nvidia.com>

WIP

8504bce

Signed-off-by: David Gardner <dagardner@nvidia.com>

WIP

c20d746

Signed-off-by: David Gardner <dagardner@nvidia.com>

WIP

c1a5aa6

Signed-off-by: David Gardner <dagardner@nvidia.com>

Fix definition of the job status endpoint

5936a61

Signed-off-by: David Gardner <dagardner@nvidia.com>

Better handling of the request and response model types

9855a49

Signed-off-by: David Gardner <dagardner@nvidia.com>

dagardner-nv self-assigned this May 23, 2025

dagardner-nv added feature request New feature or request skip-ci Optionally Skip CI for this PR non-breaking Non-breaking change labels May 23, 2025

dagardner-nv marked this pull request as draft May 23, 2025 22:51

dagardner-nv added 2 commits May 23, 2025 16:49

Implement the sync_timeout feature

392023c

Signed-off-by: David Gardner <dagardner@nvidia.com>

Re-order imports

7c364ec

Signed-off-by: David Gardner <dagardner@nvidia.com>

dagardner-nv added 11 commits May 23, 2025 16:58

Lint fix

945de8e

Signed-off-by: David Gardner <dagardner@nvidia.com>

Remove comment

128fccc

Signed-off-by: David Gardner <dagardner@nvidia.com>

Add some locking around the job store

113a3db

Signed-off-by: David Gardner <dagardner@nvidia.com>

Log the expected /generate payload body on startup

e9cba01

Signed-off-by: David Gardner <dagardner@nvidia.com>

Merge branch 'develop' of github.com:NVIDIA/AIQToolkit into david-asy…

3109014

…nc-generate-223 Signed-off-by: David Gardner <dagardner@nvidia.com>

Allow up to 10 jobs to run concurrently

c3475ed

Signed-off-by: David Gardner <dagardner@nvidia.com>

Replace hard-coded concurrency value with a command line flag

447fcbb

Signed-off-by: David Gardner <dagardner@nvidia.com>

Merge branch 'david-async-generate-223-concur' into david-async-gener…

f7c25f2

…ate-223

Merge branch 'develop' of github.com:NVIDIA/AIQToolkit into david-asy…

1be5baf

…nc-generate-223 Signed-off-by: David Gardner <dagardner@nvidia.com>

Add test for async generation

a248752

Signed-off-by: David Gardner <dagardner@nvidia.com>

Add a test for a non-existent job status

5c74dd6

Signed-off-by: David Gardner <dagardner@nvidia.com>

dagardner-nv removed the skip-ci Optionally Skip CI for this PR label May 28, 2025

dagardner-nv marked this pull request as ready for review May 28, 2025 16:24

dagardner-nv added 4 commits May 28, 2025 09:25

WIP

8e7253f

Signed-off-by: David Gardner <dagardner@nvidia.com>

Declare gunicorn as an optional dep

c99fb4f

Signed-off-by: David Gardner <dagardner@nvidia.com>

Merge branch 'develop' of github.com:NVIDIA/AIQToolkit into david-asy…

4fd537b

…nc-generate-223 Signed-off-by: David Gardner <dagardner@nvidia.com>

Merge branch 'develop' of github.com:NVIDIA/AIQToolkit into david-asy…

e770e4a

…nc-generate-223 Signed-off-by: David Gardner <dagardner@nvidia.com>

AnuradhaKaruppiah approved these changes Jun 2, 2025

View reviewed changes

src/aiq/front_ends/fastapi/fastapi_front_end_config.py Outdated Show resolved Hide resolved

src/aiq/front_ends/fastapi/fastapi_front_end_config.py Outdated Show resolved Hide resolved

Apply suggestions from code review

2c149e8

Co-authored-by: Anuradha Karuppiah <anuradha.karuppiah@gmail.com> Signed-off-by: David Gardner <96306125+dagardner-nv@users.noreply.github.com>

rapids-bot bot merged commit 70f3439 into NVIDIA:develop Jun 3, 2025
12 checks passed

dagardner-nv deleted the david-async-generate-223 branch June 3, 2025 16:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add an async version of the /generate endpoint #315

Add an async version of the /generate endpoint #315

Uh oh!

dagardner-nv commented May 23, 2025 •

edited

Loading

Uh oh!

AnuradhaKaruppiah left a comment

Uh oh!

Uh oh!

Uh oh!

AnuradhaKaruppiah commented Jun 2, 2025 •

edited

Loading

Uh oh!

dagardner-nv commented Jun 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add an async version of the /generate endpoint #315

Add an async version of the /generate endpoint #315

Uh oh!

Conversation

dagardner-nv commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Open Questions

By Submitting this PR I confirm:

Uh oh!

AnuradhaKaruppiah left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

AnuradhaKaruppiah commented Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dagardner-nv commented Jun 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dagardner-nv commented May 23, 2025 •

edited

Loading

AnuradhaKaruppiah commented Jun 2, 2025 •

edited

Loading