-
Notifications
You must be signed in to change notification settings - Fork 396
Add an async version of the /generate endpoint #315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an async version of the /generate endpoint #315
Conversation
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
…g dir, instead simply require a yaml file extension Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
…nc-generate-223 Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
…nc-generate-223 Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
…nc-generate-223 Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
…nc-generate-223 Signed-off-by: David Gardner <dagardner@nvidia.com>
…nc-generate-223 Signed-off-by: David Gardner <dagardner@nvidia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This should be left as one, evaluation runs the workflow and evaluators concurrently and that is managed via a max_concurrency config in the yaml file. Adding another layer of concurrency on top of it may make it harder to predict the resource requirements. For 4 & 5, I agree. It should be addressed separately once the design is finalized. |
Co-authored-by: Anuradha Karuppiah <anuradha.karuppiah@gmail.com> Signed-off-by: David Gardner <96306125+dagardner-nv@users.noreply.github.com>
|
/merge |
* Adds two new endpoints:
- `/generate/async`
- `/generate/async/job/{job_id}`
* Adds a `--max_running_async_jobs` flag to `aiq serve` to control the number of concurrent async generate jobs.
* Similar to the evaluate API the `/generate/async` endpoint optionally receives a user-assigned `job_id` (otherwise one is generated), and an `expiry_seconds` (defaults to an hour).
* Since the payload body of the `/generate` endpoint (`GenerateBodyType`) is determined dynamically, so too is the payload for the async endpoint, however this makes `job_id`, `sync_timeout`, and `expiry_seconds` effectively *reserved* fields.
Usage example:
Terminal 1:
```bash
aiq serve --config_file=examples/simple/configs/config.yml
```
Terminal 2:
```bash
curl -X 'POST' \
'http://localhost:8000/generate/async' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{"input_message": "What time is it?", "job_id": "1"}' | jq
curl --request GET --url http://localhost:8000/generate/async/job/1
```
1) Do we need a GET variant for the `/generate/async` endpoint? I see this for other endpoints, but it doesn't really make sense
2) Do we need an OpenAI variant? If so it would be enabled with `"background": true` in the payload body, however we currently don't enable stream in this way.
3) Currently async evaluations are limited to one concurrent job, should this be updated to use the `max_running_async_jobs` config value?
4) Is issue NVIDIA#313 outside the scope of this PR, or a requirement? I'm currently assuming that it is outside the scope.
5) There is currently no mechanism for canceling long running jobs which may be stuck in an endless loop. I held off on adding this as it started feeling like we were going out of scope of AIQ, and that maybe rather than using Starlette background tasks we should use something like Dask or Celery.
Closes NVIDIA#223
- I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/AIQToolkit/blob/develop/docs/source/resources/contributing.md).
- We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
- Any contribution which contains commits that are not Signed-Off will not be accepted.
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.
Authors:
- David Gardner (https://github.com/dagardner-nv)
Approvers:
- Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah)
URL: NVIDIA#315
* Adds two new endpoints:
- `/generate/async`
- `/generate/async/job/{job_id}`
* Adds a `--max_running_async_jobs` flag to `aiq serve` to control the number of concurrent async generate jobs.
* Similar to the evaluate API the `/generate/async` endpoint optionally receives a user-assigned `job_id` (otherwise one is generated), and an `expiry_seconds` (defaults to an hour).
* Since the payload body of the `/generate` endpoint (`GenerateBodyType`) is determined dynamically, so too is the payload for the async endpoint, however this makes `job_id`, `sync_timeout`, and `expiry_seconds` effectively *reserved* fields.
Usage example:
Terminal 1:
```bash
aiq serve --config_file=examples/simple/configs/config.yml
```
Terminal 2:
```bash
curl -X 'POST' \
'http://localhost:8000/generate/async' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{"input_message": "What time is it?", "job_id": "1"}' | jq
curl --request GET --url http://localhost:8000/generate/async/job/1
```
1) Do we need a GET variant for the `/generate/async` endpoint? I see this for other endpoints, but it doesn't really make sense
2) Do we need an OpenAI variant? If so it would be enabled with `"background": true` in the payload body, however we currently don't enable stream in this way.
3) Currently async evaluations are limited to one concurrent job, should this be updated to use the `max_running_async_jobs` config value?
4) Is issue NVIDIA#313 outside the scope of this PR, or a requirement? I'm currently assuming that it is outside the scope.
5) There is currently no mechanism for canceling long running jobs which may be stuck in an endless loop. I held off on adding this as it started feeling like we were going out of scope of AIQ, and that maybe rather than using Starlette background tasks we should use something like Dask or Celery.
Closes NVIDIA#223
- I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/AIQToolkit/blob/develop/docs/source/resources/contributing.md).
- We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
- Any contribution which contains commits that are not Signed-Off will not be accepted.
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.
Authors:
- David Gardner (https://github.com/dagardner-nv)
Approvers:
- Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah)
URL: NVIDIA#315
Signed-off-by: Eric Evans <194135482+ericevans-nv@users.noreply.github.com>
* Adds two new endpoints:
- `/generate/async`
- `/generate/async/job/{job_id}`
* Adds a `--max_running_async_jobs` flag to `aiq serve` to control the number of concurrent async generate jobs.
* Similar to the evaluate API the `/generate/async` endpoint optionally receives a user-assigned `job_id` (otherwise one is generated), and an `expiry_seconds` (defaults to an hour).
* Since the payload body of the `/generate` endpoint (`GenerateBodyType`) is determined dynamically, so too is the payload for the async endpoint, however this makes `job_id`, `sync_timeout`, and `expiry_seconds` effectively *reserved* fields.
Usage example:
Terminal 1:
```bash
aiq serve --config_file=examples/simple/configs/config.yml
```
Terminal 2:
```bash
curl -X 'POST' \
'http://localhost:8000/generate/async' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{"input_message": "What time is it?", "job_id": "1"}' | jq
curl --request GET --url http://localhost:8000/generate/async/job/1
```
1) Do we need a GET variant for the `/generate/async` endpoint? I see this for other endpoints, but it doesn't really make sense
2) Do we need an OpenAI variant? If so it would be enabled with `"background": true` in the payload body, however we currently don't enable stream in this way.
3) Currently async evaluations are limited to one concurrent job, should this be updated to use the `max_running_async_jobs` config value?
4) Is issue NVIDIA#313 outside the scope of this PR, or a requirement? I'm currently assuming that it is outside the scope.
5) There is currently no mechanism for canceling long running jobs which may be stuck in an endless loop. I held off on adding this as it started feeling like we were going out of scope of AIQ, and that maybe rather than using Starlette background tasks we should use something like Dask or Celery.
Closes NVIDIA#223
- I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/AIQToolkit/blob/develop/docs/source/resources/contributing.md).
- We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
- Any contribution which contains commits that are not Signed-Off will not be accepted.
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.
Authors:
- David Gardner (https://github.com/dagardner-nv)
Approvers:
- Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah)
URL: NVIDIA#315
Signed-off-by: Eric Evans <194135482+ericevans-nv@users.noreply.github.com>
* Adds two new endpoints:
- `/generate/async`
- `/generate/async/job/{job_id}`
* Adds a `--max_running_async_jobs` flag to `aiq serve` to control the number of concurrent async generate jobs.
* Similar to the evaluate API the `/generate/async` endpoint optionally receives a user-assigned `job_id` (otherwise one is generated), and an `expiry_seconds` (defaults to an hour).
* Since the payload body of the `/generate` endpoint (`GenerateBodyType`) is determined dynamically, so too is the payload for the async endpoint, however this makes `job_id`, `sync_timeout`, and `expiry_seconds` effectively *reserved* fields.
Usage example:
Terminal 1:
```bash
aiq serve --config_file=examples/simple/configs/config.yml
```
Terminal 2:
```bash
curl -X 'POST' \
'http://localhost:8000/generate/async' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{"input_message": "What time is it?", "job_id": "1"}' | jq
curl --request GET --url http://localhost:8000/generate/async/job/1
```
#### Open Questions
1) Do we need a GET variant for the `/generate/async` endpoint? I see this for other endpoints, but it doesn't really make sense
2) Do we need an OpenAI variant? If so it would be enabled with `"background": true` in the payload body, however we currently don't enable stream in this way.
3) Currently async evaluations are limited to one concurrent job, should this be updated to use the `max_running_async_jobs` config value?
4) Is issue NVIDIA#313 outside the scope of this PR, or a requirement? I'm currently assuming that it is outside the scope.
5) There is currently no mechanism for canceling long running jobs which may be stuck in an endless loop. I held off on adding this as it started feeling like we were going out of scope of AIQ, and that maybe rather than using Starlette background tasks we should use something like Dask or Celery.
Closes NVIDIA#223
## By Submitting this PR I confirm:
- I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/AIQToolkit/blob/develop/docs/source/resources/contributing.md).
- We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
- Any contribution which contains commits that are not Signed-Off will not be accepted.
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.
Authors:
- David Gardner (https://github.com/dagardner-nv)
Approvers:
- Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah)
URL: NVIDIA#315
* Adds two new endpoints:
- `/generate/async`
- `/generate/async/job/{job_id}`
* Adds a `--max_running_async_jobs` flag to `aiq serve` to control the number of concurrent async generate jobs.
* Similar to the evaluate API the `/generate/async` endpoint optionally receives a user-assigned `job_id` (otherwise one is generated), and an `expiry_seconds` (defaults to an hour).
* Since the payload body of the `/generate` endpoint (`GenerateBodyType`) is determined dynamically, so too is the payload for the async endpoint, however this makes `job_id`, `sync_timeout`, and `expiry_seconds` effectively *reserved* fields.
Usage example:
Terminal 1:
```bash
aiq serve --config_file=examples/simple/configs/config.yml
```
Terminal 2:
```bash
curl -X 'POST' \
'http://localhost:8000/generate/async' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{"input_message": "What time is it?", "job_id": "1"}' | jq
curl --request GET --url http://localhost:8000/generate/async/job/1
```
#### Open Questions
1) Do we need a GET variant for the `/generate/async` endpoint? I see this for other endpoints, but it doesn't really make sense
2) Do we need an OpenAI variant? If so it would be enabled with `"background": true` in the payload body, however we currently don't enable stream in this way.
3) Currently async evaluations are limited to one concurrent job, should this be updated to use the `max_running_async_jobs` config value?
4) Is issue NVIDIA#313 outside the scope of this PR, or a requirement? I'm currently assuming that it is outside the scope.
5) There is currently no mechanism for canceling long running jobs which may be stuck in an endless loop. I held off on adding this as it started feeling like we were going out of scope of AIQ, and that maybe rather than using Starlette background tasks we should use something like Dask or Celery.
Closes NVIDIA#223
## By Submitting this PR I confirm:
- I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/AIQToolkit/blob/develop/docs/source/resources/contributing.md).
- We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
- Any contribution which contains commits that are not Signed-Off will not be accepted.
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.
Authors:
- David Gardner (https://github.com/dagardner-nv)
Approvers:
- Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah)
URL: NVIDIA#315
Description
Adds two new endpoints:
/generate/async/generate/async/job/{job_id}Adds a
--max_running_async_jobsflag toaiq serveto control the number of concurrent async generate jobs.Similar to the evaluate API the
/generate/asyncendpoint optionally receives a user-assignedjob_id(otherwise one is generated), and anexpiry_seconds(defaults to an hour).Since the payload body of the
/generateendpoint (GenerateBodyType) is determined dynamically, so too is the payload for the async endpoint, however this makesjob_id,sync_timeout, andexpiry_secondseffectively reserved fields.Usage example:
Terminal 1:
Terminal 2:
Open Questions
/generate/asyncendpoint? I see this for other endpoints, but it doesn't really make sense"background": truein the payload body, however we currently don't enable stream in this way.max_running_async_jobsconfig value?workers > 1#313 outside the scope of this PR, or a requirement? I'm currently assuming that it is outside the scope.Closes #223
By Submitting this PR I confirm: