KEMBAR78
Bridging epoll and io_uring in Async Rust by Tzu Gwo | PDF
A ScyllaDB Community
Bridging epoll and io_uring
in Async Rust
Tzu Gwo
Co-founder, CEO
Tzu Gwo (he/him/his)
Co-founder, CEO
■ Worked in infra team of ByteDance, delivering
10+PB per day time-series data processing
■ Love Sci-Fi, big fan of Peter Watts
■ Find me Twi: @ yochowgwo
■ Founded tonbo.io in 2024
Background & Challenge
io_uring: unified disk and net I/O in async way
■ One async interface for both sockets and files
● network and storage I/O use the same completion queue.
■ No thread-pool detour for files
● disk I/O can be issued asynchronously just like sockets.
■ Other benefits
● Lower syscall and context-switch overhead
● Lower tail latency
● …
Async I/O in Rust Today
Unified Traits Hidden Limitations
io_uring supports both disk/network
async I/O, but not in the same way as poll-based API
compio
monoio
tokio_uring
Why io_uring Doesn’t Fit Today’s Async Rust
■ epoll: “Reading into the buffer still happens synchronously when you poll.”
● The kernel does not proactively fill the buffer.
● After user code is woken, it still has to call read(), and at that moment the data is
synchronously copied into the user-provided buffer
■ io_uring: “The kernel may fill the buffer asynchronously at any time, and you
only get a completion event once it’s done.”
● When submitting an SQE, the user already hands the buffer address to the kernel.
● The kernel or DMA can fill this buffer at any time after submission.
● Once complete, the kernel signals via a CQE and wakes the user-space Future.
Existing Approaches
Tokio-uring’s approach
■ A standalone runtime crate by Tokio team.
■ Pros:
● Familiar to Tokio users, official experiment.
■ Cons:
● Different API set (not drop-in).
● Low maintenance in recent years.
Monoio’s approach
■ Built around io_uring from day one.
■ Pros:
● High performance, no blocking detour.
● Modern async design.
■ Cons:
● Incompatible with Tokio ecosystem (most crates expect tokio::io).
● Hard to reuse existing async Rust libraries.
Neon Hybrid
■ Mechanism: io_uring → eventfd → epoll → Tokio AsyncFd → Future。
■ Pros:
● Keeps Tokio as the executor.
● Eliminates spawn_blocking overhead for file I/O.
■ Cons:
● Still needs buffer copies.
● Doesn’t expose registered buffer / batch features.
■ Diagram: show SQE -> uring -> eventfd -> epoll -> Tokio reactor -> Future.
Tokio Official Integration
■ Ongoing effort: integrate io_uring support into tokio::fs.
■ Current status:
● Basic file ops (open/read/write) under development.
● Advanced features (registered buffers, batch submit, sharding) are future work.
■ Implication:
● Goal is transparent replacement, but currently limited.
Our Approach: fusio
Fusio — Compile-time Switch
■ Core idea:
● Define I/O traits (e.g. ReadAt, WriteAll) that abstract over the backend.
● Provide multiple backend implementations: Tokio (epoll), Tokio-uring, Monoio, etc.
● Use Cargo features and type aliases to select the backend at compile time.
■ Takeaway: One API, multiple I/O engines. No app code changes.
Fusio — Compile-time Switch
■ With --features=tokio → runs on
epoll + spawn_blocking.
■ With --features=tokio-uring →
runs on io_uring completion.
■ App code unchanged.
Pros & Cons
■ Pros
● Clean abstraction, no runtime hacks.
● Middleware/app code unchanged.
● Cross-platform: use epoll where io_uring unavailable.
■ Cons
● Backend chosen at compile time (not runtime).
● Advanced io_uring features (registered buffers, O_DIRECT, batching) still require new APIs.
Value for Databases
■ Why it matters:
● Storage engines and DB middleware can target Fusio traits, not Tokio directly.
● Get io_uring benefits on Linux 5.10+ without rewriting code.
● Still compatible with existing Tokio ecosystem (when built with epoll backend).
■ Takeaway:
● “Fusio makes async Rust storage code I/O-agnostic at compile time.”
Beyond fusio
The Buffer Problem
■ Current async traits
● AsyncRead/Write expect caller-provided &mut [u8].
● Works for epoll (readiness: copy happens synchronously).
■ Problem with io_uring
● Kernel may fill buffer at any time after submission.
● Runtime must guarantee buffer lifetime, alignment, pinning.
■ Implication
● If we stick to current API, runtime has to copy from its own buffer → extra overhead.
■ Takeaway: To fully unlock io_uring, we need new buffer semantics.
Runtime-owned, Refcounted Buffers
■ Idea: Runtime manages a pool of pinned/aligned buffers.
■ API sketch:
■ Buf properties
● Safe (RAII, pinned, refcounted).
● Can be pre-registered with io_uring.
● Reusable, batch-friendly, O_DIRECT compatible.
■ Pros:
● True zero-copy, exploit io_uring’s strengths (registered buffers, batching).
■ Cons:
● Diverges from today’s AsyncRead/Write API.
● Requires ecosystem adoption.
■ Diagram: Buf pool → submit SQE → kernel fills → CQE → return Buf.
Conclusion
Conclusion
■ Async Rust today → Epoll-based, file I/O inconsistent.
■ Existing approaches → Monoio (fast but isolated), Tokio-uring (separate
runtime), Neon hybrid (bridge in Tokio), Tokio official (in progress).
■ Fusio → Clean compile-time abstraction, same API for epoll/io_uring,
middleware/app code unchanged.
■ Beyond Fusio → To fully unlock io_uring: runtime-owned buffer APIs.
Thank you! Let’s connect.
Tzu Gwo
tzu@tonbo.io
@ yochowgwo
https://tonbo.io/blogs

Bridging epoll and io_uring in Async Rust by Tzu Gwo

  • 1.
    A ScyllaDB Community Bridgingepoll and io_uring in Async Rust Tzu Gwo Co-founder, CEO
  • 2.
    Tzu Gwo (he/him/his) Co-founder,CEO ■ Worked in infra team of ByteDance, delivering 10+PB per day time-series data processing ■ Love Sci-Fi, big fan of Peter Watts ■ Find me Twi: @ yochowgwo ■ Founded tonbo.io in 2024
  • 3.
  • 4.
    io_uring: unified diskand net I/O in async way ■ One async interface for both sockets and files ● network and storage I/O use the same completion queue. ■ No thread-pool detour for files ● disk I/O can be issued asynchronously just like sockets. ■ Other benefits ● Lower syscall and context-switch overhead ● Lower tail latency ● …
  • 5.
    Async I/O inRust Today Unified Traits Hidden Limitations
  • 6.
    io_uring supports bothdisk/network async I/O, but not in the same way as poll-based API compio monoio tokio_uring
  • 7.
    Why io_uring Doesn’tFit Today’s Async Rust ■ epoll: “Reading into the buffer still happens synchronously when you poll.” ● The kernel does not proactively fill the buffer. ● After user code is woken, it still has to call read(), and at that moment the data is synchronously copied into the user-provided buffer ■ io_uring: “The kernel may fill the buffer asynchronously at any time, and you only get a completion event once it’s done.” ● When submitting an SQE, the user already hands the buffer address to the kernel. ● The kernel or DMA can fill this buffer at any time after submission. ● Once complete, the kernel signals via a CQE and wakes the user-space Future.
  • 8.
  • 9.
    Tokio-uring’s approach ■ Astandalone runtime crate by Tokio team. ■ Pros: ● Familiar to Tokio users, official experiment. ■ Cons: ● Different API set (not drop-in). ● Low maintenance in recent years.
  • 10.
    Monoio’s approach ■ Builtaround io_uring from day one. ■ Pros: ● High performance, no blocking detour. ● Modern async design. ■ Cons: ● Incompatible with Tokio ecosystem (most crates expect tokio::io). ● Hard to reuse existing async Rust libraries.
  • 11.
    Neon Hybrid ■ Mechanism:io_uring → eventfd → epoll → Tokio AsyncFd → Future。 ■ Pros: ● Keeps Tokio as the executor. ● Eliminates spawn_blocking overhead for file I/O. ■ Cons: ● Still needs buffer copies. ● Doesn’t expose registered buffer / batch features. ■ Diagram: show SQE -> uring -> eventfd -> epoll -> Tokio reactor -> Future.
  • 12.
    Tokio Official Integration ■Ongoing effort: integrate io_uring support into tokio::fs. ■ Current status: ● Basic file ops (open/read/write) under development. ● Advanced features (registered buffers, batch submit, sharding) are future work. ■ Implication: ● Goal is transparent replacement, but currently limited.
  • 13.
  • 14.
    Fusio — Compile-timeSwitch ■ Core idea: ● Define I/O traits (e.g. ReadAt, WriteAll) that abstract over the backend. ● Provide multiple backend implementations: Tokio (epoll), Tokio-uring, Monoio, etc. ● Use Cargo features and type aliases to select the backend at compile time. ■ Takeaway: One API, multiple I/O engines. No app code changes.
  • 15.
    Fusio — Compile-timeSwitch ■ With --features=tokio → runs on epoll + spawn_blocking. ■ With --features=tokio-uring → runs on io_uring completion. ■ App code unchanged.
  • 16.
    Pros & Cons ■Pros ● Clean abstraction, no runtime hacks. ● Middleware/app code unchanged. ● Cross-platform: use epoll where io_uring unavailable. ■ Cons ● Backend chosen at compile time (not runtime). ● Advanced io_uring features (registered buffers, O_DIRECT, batching) still require new APIs.
  • 17.
    Value for Databases ■Why it matters: ● Storage engines and DB middleware can target Fusio traits, not Tokio directly. ● Get io_uring benefits on Linux 5.10+ without rewriting code. ● Still compatible with existing Tokio ecosystem (when built with epoll backend). ■ Takeaway: ● “Fusio makes async Rust storage code I/O-agnostic at compile time.”
  • 18.
  • 19.
    The Buffer Problem ■Current async traits ● AsyncRead/Write expect caller-provided &mut [u8]. ● Works for epoll (readiness: copy happens synchronously). ■ Problem with io_uring ● Kernel may fill buffer at any time after submission. ● Runtime must guarantee buffer lifetime, alignment, pinning. ■ Implication ● If we stick to current API, runtime has to copy from its own buffer → extra overhead. ■ Takeaway: To fully unlock io_uring, we need new buffer semantics.
  • 20.
    Runtime-owned, Refcounted Buffers ■Idea: Runtime manages a pool of pinned/aligned buffers. ■ API sketch: ■ Buf properties ● Safe (RAII, pinned, refcounted). ● Can be pre-registered with io_uring. ● Reusable, batch-friendly, O_DIRECT compatible. ■ Pros: ● True zero-copy, exploit io_uring’s strengths (registered buffers, batching). ■ Cons: ● Diverges from today’s AsyncRead/Write API. ● Requires ecosystem adoption. ■ Diagram: Buf pool → submit SQE → kernel fills → CQE → return Buf.
  • 21.
  • 22.
    Conclusion ■ Async Rusttoday → Epoll-based, file I/O inconsistent. ■ Existing approaches → Monoio (fast but isolated), Tokio-uring (separate runtime), Neon hybrid (bridge in Tokio), Tokio official (in progress). ■ Fusio → Clean compile-time abstraction, same API for epoll/io_uring, middleware/app code unchanged. ■ Beyond Fusio → To fully unlock io_uring: runtime-owned buffer APIs.
  • 23.
    Thank you! Let’sconnect. Tzu Gwo tzu@tonbo.io @ yochowgwo https://tonbo.io/blogs