Paddler

Digital products and their users need privacy, reliability, cost control, and an option to be independent from closed-source model providers.

Paddler is an open-source LLM load balancer and serving platform. It allows you to run inference, deploy, and scale LLMs on your own infrastructure, providing a great developer experience along the way.

Key features

Inference through a built-in llama.cpp engine
LLM-specific load balancing
Works through agents that can be added dynamically, allowing integration with autoscaling tools
Request buffering, enabling scaling from zero hosts
Dynamic model swapping
Built-in web admin panel for management, monitoring, and testing
Observability metrics

Who is Paddler for?

Product teams that need LLM inference and embeddings in their features
DevOps/LLMOps teams that need to run and deploy LLMs at scale
Organizations handling sensitive data with high compliance and privacy requirements (medical, financial, etc.)
Organizations wanting to achieve predictable LLM costs instead of being exposed to per-token pricing
Product leaders who need reliable model performance to maintain a consistent user experience of their AI-based features

Installation and Quickstart

Paddler is self-contained in a single binary file, so all you need to do to start using it is obtain the paddler binary and make it available in your system.

You can obtain the binary by:

Option 1: Downloading the latest release from our GitHub releases
Option 2: Or building Paddler from source (MSRV is 1.88.0)

Using Paddler

Once you have made the binary available in your system, you can start using Paddler. The entire Paddler functionality is available through the paddler command (running paddler --help will list all available commands).

There are only two deployable components, the balancer (which distributes the incoming requests), and the agent (which generates tokens and embeddings through slots).

To start the balancer, run:

paddler balancer --inference-addr 127.0.0.1:8061 --management-addr 127.0.0.1:8060 --web-admin-panel-addr 127.0.0.1:8062

The --web-admin-panel-addr flag is optional, but it will allow you to view your setup in a web browser.

And to start an agent with, for example, 4 slots, run:

paddler agent --management-addr 127.0.0.1:8060 --slots 4

Read more about the installation and setting up a basic cluster.

Documentation

Visit our documentation page to install Paddler and get started with it.

API documentation is also available.

Video overview

Community and contributions

For questions or community conversations, use GitHub discussions or join our Discord server. All contributions are welcome.

How does it work?

Paddler is built for an easy setup. It comes as a self-contained binary with only two deployable components, the balancer and the agents.

The balancer exposes the following:

Inference service (used by applications that connect to it to obtain tokens or embeddings)
Management service, which manages the Paddler's setup internally
Web admin panel that lets you view and test your Paddler setup

Agents are usually deployed on separate instances. They further distribute the incoming requests to slots, which are responsible for generating tokens and embeddings.

Paddler uses a built-in llama.cpp engine for inference, but has its own implementation of llama.cpp slots, which keep their own context and KV cache.

Web admin panel

Paddler comes with a built-in web admin panel.

You can use it to monitor your Paddler fleet:

Add and update your model and customize the chat template and inference parameters:

And use a GUI to test the inference:

Starting out

Why the Name

We initially wanted to use Raft consensus algorithm (thus Paddler, because it paddles on a Raft), but eventually dropped that idea. The name stayed, though.

Later, people started sending us the "that's a paddlin'" clip from The Simpsons, and we just embraced it.

Name		Name	Last commit message	Last commit date
Latest commit History 641 Commits
.github/workflows		.github/workflows
example		example
jarmuz		jarmuz
resources		resources
src		src
templates		templates
.editorconfig		.editorconfig
.gitignore		.gitignore
.prettierignore		.prettierignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
eslint.config.js		eslint.config.js
jarmuz-fmt.mjs		jarmuz-fmt.mjs
jarmuz-release.mjs		jarmuz-release.mjs
jarmuz-static.mjs		jarmuz-static.mjs
jarmuz-watch.mjs		jarmuz-watch.mjs
package-lock.json		package-lock.json
package.json		package.json
rust-toolchain		rust-toolchain
rustfmt.toml		rustfmt.toml
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Paddler

Key features

Who is Paddler for?