KEMBAR78
server : add OAI compat for /v1/completions by ngxson · Pull Request #10974 · ggml-org/llama.cpp · GitHub
Skip to content

Conversation

@ngxson
Copy link
Collaborator

@ngxson ngxson commented Dec 25, 2024

Supersede #10645

Ref documentation: https://platform.openai.com/docs/api-reference/completions/object

The /v1/completions endpoint can now be OAI-compatible (not to be confused with /completion endpoint, without /v1 prefix)

Also regrouped the docs to have 2 dedicated sections: one for OAI-compat API and one for non-OAI API

TODO:

  • add test
  • add docs

@github-actions github-actions bot added the python python script changes label Dec 25, 2024
@ngxson ngxson added the breaking change Changes that break ABIs, APIs, file formats, or other forms of backwards compatibility. label Dec 25, 2024
@ngxson ngxson marked this pull request as ready for review December 25, 2024 16:05
@ngxson ngxson requested a review from ggerganov December 25, 2024 16:05
@ericcurtin
Copy link
Collaborator

Would this make llama-server compatible with this client?

https://github.com/open-webui/open-webui

if yes can we please get this in? 😄

I'm also curious for anyone in the know, it seems like a lot of the openai clients (like open-webui) expect the functionality of being able to switch models per request. Does llama-server support this and if not, what would be the effort to add that roughly?

@ggerganov
Copy link
Member

Does llama-server support this and if not, what would be the effort to add that roughly?

This is not supported atm. But this logic seems like something more suitable for a proxy/routing layer rather than implementing it in llama-server.

@ngxson
Copy link
Collaborator Author

ngxson commented Dec 31, 2024

@ericcurtin I have no idea if they support 3rd party openai-compatible server or not. Judging from they README, they kinda support it via :ollama docker image tag, but I'm not sure if that means "image with ollama built-in" or "bring your own ollama server"

In either case, I think they rely on /v1/chat/completions, which we already have in llama.cpp. So it's not related to the current PR.

@ngxson ngxson merged commit 5896c65 into ggml-org:master Dec 31, 2024
50 checks passed
@mostlygeek
Copy link
Contributor

This is not supported atm. But this logic seems like something more suitable for a proxy/routing layer rather than implementing it in llama-server.

I wrote llama-swap for just this purpose. It’s a transparent proxy that will swap llama-server based on the model name in the api call. It’s a single golang binary with no dependencies so it is easy to deploy.

tinglou pushed a commit to tinglou/llama.cpp that referenced this pull request Feb 13, 2025
* server : add OAI compat for /v1/completions

* add test

* add docs

* better docs
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Feb 26, 2025
* server : add OAI compat for /v1/completions

* add test

* add docs

* better docs
mglambda pushed a commit to mglambda/llama.cpp that referenced this pull request Mar 8, 2025
* server : add OAI compat for /v1/completions

* add test

* add docs

* better docs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking change Changes that break ABIs, APIs, file formats, or other forms of backwards compatibility. examples python python script changes server

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants