KEMBAR78
vortex: Add CI, rust driver by brson · Pull Request #3017 · tigerbeetle/tigerbeetle · GitHub
Skip to content

Conversation

brson
Copy link
Contributor

@brson brson commented Jun 9, 2025

Here are a bunch of changes to get vortex moving again.

Background: vortex runs as a group of processes, one of which is a language-specific driver. There is a built-in driver for zig, a java driver, and this PR adds a rust driver. Vortex really wants to be run in its own linux namespace so that it controls the entire network, and theoretically I think so that all processes can be torn down automatically when the supervisor exits. As such it only really works correctly under linux.

This PR:

  • adds a rust driver
  • adds java/rust vortex smoke tests to the existing client ci infra
  • adds a run command that sets up the linux namespace, previously not handled directly by vortex

It also makes various small tweaks to accomplish the above.

The reduction of the JDK minimum version is because my development machine has that particular JDK. Seems to work fine.

@brson
Copy link
Contributor Author

brson commented Jun 9, 2025

The ubuntu runnner failure is because ubuntu needs a system-level configuration change to support the unshare command (other linuxes don't). I'll figure out the best place to do that later.


const log = std.log.scoped(.run);

pub fn main(allocator: std.mem.Allocator) !void {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of shelling out to unshare, could the supervisor just run the appropriate std.os.linux.unshare() syscalls directly to set up the new namespaces? Then we don't need a separate binary/process to handle it.

(Now that I think of it, the same approach would probably be useful for the cfo supervisor. That is out of scope here though.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

I also implemented ip link set up dev lo as netlink commands.

@brson
Copy link
Contributor Author

brson commented Aug 10, 2025

Some reviews done, a few left.

@brson
Copy link
Contributor Author

brson commented Aug 10, 2025

@sentientwaffle all reviews addressed.

@brson
Copy link
Contributor Author

brson commented Aug 10, 2025

CI failing because the unshare implementation is not working, can't open /proc/self/setgroups

+- run scripts failure
rust ci: 5m41.413s
2025-08-10 22:44:30.858Z error: process failed with ExecNonZeroExitStatus: ../../../../zig-out/bin/vortex run --driver-command=../../../../src/testing/vortex/rust_driver/target/debug/vortex-driver-rust --tigerbeetle-executable=../../../../zig-out/bin/tigerbeetle --output-directory=/home/runner/work/tigerbeetle/tigerbeetle/.zig-cache/tmp/8530484706316026764 --disable-faults --test-duration-seconds=10
2025-08-10 22:44:30.858Z error: stderr:
++++
2025-08-10 22:44:30.669Z error(run): Failed to open /proc/self/setgroups: error.AccessDenied

@brson
Copy link
Contributor Author

brson commented Aug 13, 2025

Neither the hand-coded unshare implementation nor just calling unshare work on the CI for slightly different access-denied reasons. It seems like there's some capability that the processes in the CI container don't have.

@brson brson force-pushed the vortex-1 branch 2 times, most recently from 74693f7 to d947815 Compare August 13, 2025 20:43
@brson brson force-pushed the vortex-1 branch 4 times, most recently from d59e8b3 to 12be0a3 Compare August 13, 2025 21:52
@brson
Copy link
Contributor Author

brson commented Aug 13, 2025

I think unshare is working with CI now. Problem now is that the java vortex driver doesn't build with JDK 11. I'll just rewrite it so it does.

@brson
Copy link
Contributor Author

brson commented Aug 14, 2025

I converted the vortex driver to Java 11 to be compatible with the CI JDK. The resulting code is sadly much more verbose by not having record syntax.

I think this is all ready to go.

Copy link
Member

@sentientwaffle sentientwaffle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM, modulo a few nits and questions!

Comment on lines -10 to -11
<maven.compiler.source>21</maven.compiler.source>
<maven.compiler.target>21</maven.compiler.target>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to use such an old jvm?

Copy link
Contributor Author

@brson brson Aug 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can not find that information other than CI tests JDK 11.

Comment on lines +39 to +47
// Run the supervisor
var child = std.process.Child.init(argv.items, allocator);
child.stdin_behavior = .Inherit;
child.stdout_behavior = .Inherit;
child.stderr_behavior = .Inherit;

try child.spawn();
const result = try child.wait();
std.process.exit(@intFromEnum(result));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of scope, but I'm curious -- do unshare/ip_link_loopback affect this current process, or only the child process? e.g. is it strictly required that the supervisor be a child process here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some things unshare does affect the current process, some only take effect on the child process.

The child process is required here to make the process reaping behavior work. With the new pid namespace, the child supervisor process will become the special pid 1 and we'll get the benefit of the behavior where every other process dies when the supervisor dies.

This does lead to the question of what happen when the parent "run" process (not pid 1) is killed - is the supervisor not killed? I have not investigated closely nor noticed any annoying behavior.

Comment on lines +71 to +93
// Create user namespace first
const unshare_user_result = std.os.linux.unshare(linux.CLONE.NEWUSER);
const unshare_user_errno = std.posix.errno(unshare_user_result);
if (unshare_user_errno != .SUCCESS) {
log.err("Failed to create user namespace: {}", .{unshare_user_errno});
return error.UnshareFailure;
}

// Create PID namespace
const unshare_pid_result = std.os.linux.unshare(linux.CLONE.NEWPID);
const unshare_pid_errno = std.posix.errno(unshare_pid_result);
if (unshare_pid_errno != .SUCCESS) {
log.err("Failed to create pid namespace: {}", .{unshare_pid_errno});
return error.UnshareFailure;
}

// Create network namespace
const unshare_net_result = std.os.linux.unshare(linux.CLONE.NEWNET);
const unshare_net_errno = std.posix.errno(unshare_net_result);
if (unshare_net_errno != .SUCCESS) {
log.err("Failed to create net namespace: {}", .{unshare_net_errno});
return error.UnshareFailure;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice -- I guess the unshare calls could be combined, but the more detailed error messages make splitting them up worth it!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly yes. And I think there's some technical reason the user namespace has to be created separately anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the example test command be swapped with a simpler vortex run command?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the readme.

I will write better vortex docs once it comes together a bit more. Right now it has the big problem that the ability to build and configure individual drivers is spread across several places. In a future patch I will make it easier to use vortex manually: putting the driver build steps into the tigerbeetle build system, and probably also putting some kind of run-vortex:java commands into the build system.

}
Ok(Some(Request::LookupTransfers(events)))
}
_ => todo!("{op}"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this actually print the op variable, or just the string "{op}"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it is a formatting string interpolation that captures op.

brson and others added 4 commits August 20, 2025 12:18
Co-authored-by: djg <sentientwaffle@gmail.com>
Co-authored-by: djg <sentientwaffle@gmail.com>
@brson
Copy link
Contributor Author

brson commented Aug 20, 2025

@sentientwaffle I think I've addressed all new reviews.

@brson brson added this pull request to the merge queue Aug 20, 2025
Merged via the queue into tigerbeetle:main with commit 0321207 Aug 20, 2025
38 checks passed
@brson brson deleted the vortex-1 branch August 20, 2025 19:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants