-
Notifications
You must be signed in to change notification settings - Fork 698
vortex: Add CI, rust driver #3017
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This is useful for smoke tests and development.
|
The ubuntu runnner failure is because ubuntu needs a system-level configuration change to support the |
|
|
||
| const log = std.log.scoped(.run); | ||
|
|
||
| pub fn main(allocator: std.mem.Allocator) !void { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of shelling out to unshare, could the supervisor just run the appropriate std.os.linux.unshare() syscalls directly to set up the new namespaces? Then we don't need a separate binary/process to handle it.
(Now that I think of it, the same approach would probably be useful for the cfo supervisor. That is out of scope here though.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
I also implemented ip link set up dev lo as netlink commands.
|
Some reviews done, a few left. |
|
@sentientwaffle all reviews addressed. |
|
CI failing because the unshare implementation is not working, can't open |
|
Neither the hand-coded unshare implementation nor just calling unshare work on the CI for slightly different access-denied reasons. It seems like there's some capability that the processes in the CI container don't have. |
74693f7 to
d947815
Compare
d59e8b3 to
12be0a3
Compare
|
I think unshare is working with CI now. Problem now is that the java vortex driver doesn't build with JDK 11. I'll just rewrite it so it does. |
|
I converted the vortex driver to Java 11 to be compatible with the CI JDK. The resulting code is sadly much more verbose by not having I think this is all ready to go. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly LGTM, modulo a few nits and questions!
| <maven.compiler.source>21</maven.compiler.source> | ||
| <maven.compiler.target>21</maven.compiler.target> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to use such an old jvm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can not find that information other than CI tests JDK 11.
| // Run the supervisor | ||
| var child = std.process.Child.init(argv.items, allocator); | ||
| child.stdin_behavior = .Inherit; | ||
| child.stdout_behavior = .Inherit; | ||
| child.stderr_behavior = .Inherit; | ||
|
|
||
| try child.spawn(); | ||
| const result = try child.wait(); | ||
| std.process.exit(@intFromEnum(result)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of scope, but I'm curious -- do unshare/ip_link_loopback affect this current process, or only the child process? e.g. is it strictly required that the supervisor be a child process here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some things unshare does affect the current process, some only take effect on the child process.
The child process is required here to make the process reaping behavior work. With the new pid namespace, the child supervisor process will become the special pid 1 and we'll get the benefit of the behavior where every other process dies when the supervisor dies.
This does lead to the question of what happen when the parent "run" process (not pid 1) is killed - is the supervisor not killed? I have not investigated closely nor noticed any annoying behavior.
| // Create user namespace first | ||
| const unshare_user_result = std.os.linux.unshare(linux.CLONE.NEWUSER); | ||
| const unshare_user_errno = std.posix.errno(unshare_user_result); | ||
| if (unshare_user_errno != .SUCCESS) { | ||
| log.err("Failed to create user namespace: {}", .{unshare_user_errno}); | ||
| return error.UnshareFailure; | ||
| } | ||
|
|
||
| // Create PID namespace | ||
| const unshare_pid_result = std.os.linux.unshare(linux.CLONE.NEWPID); | ||
| const unshare_pid_errno = std.posix.errno(unshare_pid_result); | ||
| if (unshare_pid_errno != .SUCCESS) { | ||
| log.err("Failed to create pid namespace: {}", .{unshare_pid_errno}); | ||
| return error.UnshareFailure; | ||
| } | ||
|
|
||
| // Create network namespace | ||
| const unshare_net_result = std.os.linux.unshare(linux.CLONE.NEWNET); | ||
| const unshare_net_errno = std.posix.errno(unshare_net_result); | ||
| if (unshare_net_errno != .SUCCESS) { | ||
| log.err("Failed to create net namespace: {}", .{unshare_net_errno}); | ||
| return error.UnshareFailure; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice -- I guess the unshare calls could be combined, but the more detailed error messages make splitting them up worth it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly yes. And I think there's some technical reason the user namespace has to be created separately anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could the example test command be swapped with a simpler vortex run command?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the readme.
I will write better vortex docs once it comes together a bit more. Right now it has the big problem that the ability to build and configure individual drivers is spread across several places. In a future patch I will make it easier to use vortex manually: putting the driver build steps into the tigerbeetle build system, and probably also putting some kind of run-vortex:java commands into the build system.
| } | ||
| Ok(Some(Request::LookupTransfers(events))) | ||
| } | ||
| _ => todo!("{op}"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this actually print the op variable, or just the string "{op}"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it is a formatting string interpolation that captures op.
Co-authored-by: djg <sentientwaffle@gmail.com>
Co-authored-by: djg <sentientwaffle@gmail.com>
|
@sentientwaffle I think I've addressed all new reviews. |
Here are a bunch of changes to get vortex moving again.
Background: vortex runs as a group of processes, one of which is a language-specific driver. There is a built-in driver for zig, a java driver, and this PR adds a rust driver. Vortex really wants to be run in its own linux namespace so that it controls the entire network, and theoretically I think so that all processes can be torn down automatically when the supervisor exits. As such it only really works correctly under linux.
This PR:
runcommand that sets up the linux namespace, previously not handled directly by vortexIt also makes various small tweaks to accomplish the above.
The reduction of the JDK minimum version is because my development machine has that particular JDK. Seems to work fine.