-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Record most recent input's mtime in log file for correct dirty status computation #1165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
7b41e18 to
215541b
Compare
|
rebased on current origin/master |
215541b to
f02ae3f
Compare
|
rebased on current origin/master |
|
@evmar @nico Is there any way you guys could take a look at this and get it merged in? We're still experiencing the negative effects of ninja without this patch and it's been over a year now since I originally opened the pull request. Thank you for all the hard work you put in to make such a great build tool. |
8b2e64d to
b40919b
Compare
|
This adds a whole bunch of machinery for something most people don't seem to need, and that has a workaround that doesn't need ninja changes (see discussion on #1162). So I'm a bit reluctant about merging this. |
|
The workaround doesn’t actually work. Wrapping the build commands seems like a good idea initially, but it doesn’t capture implicit dependencies. This pull request removes an inherent race condition between the time an output starts building and the time it finishes. I don’t think this code is complex enough to not merge, when it actually makes ninja produce more correct builds. Correctness is a good thing, and worth the tiny bit of extra complexity to ensure it. |
|
Wrappers are a nightmare. They introduce a layer of indirection that "hopefully" doesn't cause any sort of conflict with the tool. Plus the fact the only good use of a wrapper is to provide a nicer interface around the tool in question to reduce the complexity of the tool in question (see Clang's and GCC's drivers). Also, as I understand the reason for the change, Ninja right now cannot prove that it's build state is valid. If there is any way for a file to be modified after a job for an output is started, the build state is now indeterminate, as it has no way of detecting that the input files changed between when the job started and when it finished. This changes makes it so that the output's mtime is snapshotted as the job is started, which means that any further changes outside Ninja's control can't leave it in an invalid state. I would think that a build tool that can't prove that it's in a valid state isn't very useful. |
b40919b to
5d19cb6
Compare
|
@nico I understand your reluctance to merge this. However, I just want to implore that you reconsider the discussion in #1162 and realize that the workarounds mentioned do not work. If we consider the number one goal of a build system, isn't it to ensure that all builds are correct and repeatable? Without my fix, ninja is fundamentally broken, as you can't truly guarantee that outputs are up to date (try running the simple test case I added without my fix in place). This fix isn't even that invasive--it adds a new field to the log file, but that's about the most complex thing it does. Please consider merging this early in the release cycle so it can get as much testing as possible. I think we can all agree that a provably correct build system is worth it. I think everyone needs this even if they don't realize it. |
|
This would be a very useful change, as a build system that you can't trust is... suboptimal. The change itself appears to be fairly minimal. |
5d19cb6 to
1a01827
Compare
|
I changed this PR's implementation a bit after going through the changes to have nanosecond timestamp precision. Instead of taking a snapshot of the system time when an edge begins generating its outputs, the log file will now just record the mtime of the most recent input as calculated when ninja is determining whether an edge is dirty. This is even more correct than my original PR, and dovetails nicely with #1219 since this improvement is based solely on mtimes of actual files (instead of also trying to compare to a snapshot of the current system time). I updated the initial description of the PR to reflect the new implementation and to explain how this fixes the original problem. |
1a01827 to
5ace915
Compare
…tus computation If an edge's output files' mtimes are compared to the most recent input's mtime, edges might be calculated as clean even if they are actually dirty. While an edge is running its rule to produce its outputs and an input to the edge is updated before the outputs are written to disk, then subsequent runs will think that the outputs are newer than the inputs, even though the inputs have actually been updated and may be different than what were used to produce those outputs. Instead of comparing output mtimes, ninja will add the most recent input's mtime to the build log file when an edge completes if all implicit dependencies were stat'd before the output began to build (all implicit dependencies were already loaded via the deps log file). It looks at both the explicit and implicit dependencies to get the most recent input. On subsequent runs, ninja will use this value to compare to the edge's most recent input's mtime to determine whether the outputs are dirty if a valid mtime was recorded in the log file.
5ace915 to
41491a2
Compare
|
Hello, any chamce this could be merged soon? Thank you! |
I will review this soon, but it definitely won't make it into 1.9, sorry. |
|
@jhasse Thank you for taking some time to look at it. Before you get too far in reviewing, I'd like to hear your thoughts on the two possible implementations to solve #1162. This PR currently does not solve the issue 100% of time, as it requires all inputs to already be in the deplog before it will use the input mtimes for dirty status checking. This means that the very first ninja run, or any new input dependencies that are introduced are still subject to the race condition. The other solution (which was the first approach I took), is to take a system time stamp right when a build begins, and stamp that time in the deplog for each generated output. This makes ninja correct 100% of the time, but might require some various system-dependent mechanisms for getting the most accurate (file-system mtime-equivalent) system time. Linux, Windows, and FreeBSD all have their own syscalls and interfaces for this. I actually prefer (2) here, but before I modify the PR to do it this way, I'd like to get consensus. (2) will require some additional platform-specific code for getting the current system time, but would make ninja 100% correct. |
|
FWIW I'd still recommend that you maintain this on your fork as mentioned above. Re "fundamentally broken": That's true for this use case, but that use case isn't common, and it doesn't mean ninja isn't still useful. Make has had the same issue for decades from what I understand. (mtime-based tools are also "fundamentally broken" on NFS shares, but if you build on an NFS share you already have bigger problems, etc.) |
I use ninja in my IDE. Often, after starting a build, I see an error and fix it while the build is still running. I have to make sure not to save the file while ninja. I think it's a common scenario. What isn't common though, that this leads to a bug that someone notices. But that makes it even more relevant in my opinion: Hard to reproduce bugs are the worst ;)
Not sure if that would make it 100% correct, as there's the difference between the system time and the time the filesystem gets updated (i.e. is the filesystem operation atomic? And are all the clocks synchronized?). |
100% is probably a bit optimistic. Of course it depends on the timestamp granularity of the filesystem being used, and how the filesystem decides to record the mtime when writing files. If we allow ninja to record the most granular timestamp we can glean from the OS as possible when running a command for an output, we can dramatically (and depending on filesystem--completely) remove the race condition that this PR tries to solve. Your point about the difference between system time and time the filesystem gets updated shouldn't matter with my second solution, since the timestamp for the system time (which is recorded in the deplog) is taken before the job is even started, and the only time that matters on the other end is the time the filesystem uses to stamp the mtime attribute. Scenarios:
|
|
Sounds good :) |
|
Ok, I'll work on adjusting this PR to go down that route. Stay tuned. |
|
@jhasse I plan on getting back to this soon. Life kinda got in the way last summer and this fell down the priority list. |
|
Is it a guarantee that the timestamps used by the file system match the system time? I wouldn't naively expect them to be necessarily related -- e.g. stuff like different timestamp resolution, network file systems, delayed writes and so on. I think if you're making a decision based on fs-reported timestamps the correct thing is to use those timestamps as if they were opaque. (disclaimer: haven't touched ninja in like 8 years at this point) |
|
@evmar I've had awhile to think about this problem again. I agree that using a system time to compare against filesystem times is probably not the ideal solution for many reasons, one of which is (as you mention) that we can't reliably use a filesystem-reported mtime since it could be the system time of another machine, or not a time at all. Avery Pennarun has a great summary of the problems with mtime comparisons for build tools in his blog: https://apenwarr.ca/log/20181113 I think I'm going to close this PR and think about possible ways to incorporate some of |
|
My take: the best option is to hash the inputs -- not their contents, but rather the names and mtimes of the inputs treated as a blob. Then if anything changes (time forwards or backwards) we rebuild. |
|
That's essentially where I'm going with it, but I might hash/track more than the name and mtime of inputs. I'm considering inode number and file size as well. The issue I'm still trying to resolve is how to do initial dependency discovery. The first time ninja builds an output, it doesn't know all its dependencies yet. The compiler can generate this information ( I want to make the final solution resilient against all of these types of issues. |
|
@jhasse I have another version of this PR that resolves this issue rather cleanly. It can't be perfect (due to the nature of implicit dependency discovery via depfiles), but it gets close. And when used in combination with @bradking's Do you want me to force push this new implementation to this PR, or close this one and create a new one? |
|
Close this one and create a new one :) |
This adds support for recording an edge's most recent input file's mtime in the build log. This fixes #1162, by giving ninja a better metric for determining whether an edge's outputs are dirty.
If an edge's output files' mtimes are compared to the most recent input's mtime, edges might be calculated as clean even if they are actually dirty. While an edge is running its rule to produce its outputs and an input to the edge is updated before the outputs are written to disk, then subsequent runs will think that the outputs are newer than the inputs, even though the inputs have actually been updated and may be different than what were used to produce those outputs. This can be disastrous if the resulting object files still link, since some of the object files might be out of date.
Instead of comparing output mtimes, ninja will add the most recent input's mtime to the build log file (calculated when the edge first calculates its dirty status) when an edge completes. On subsequent runs, ninja will use this value to compare to the edge's most recent input's mtime to determine whether the outputs are dirty, which is provably more correct and safer, since outputs are guaranteed to actually be up to date.
A test is included for recording the correct input_time in the log file, ensuring that subsequent ninja runs utilize that value correctly.