Record most recent input's mtime in log file for correct dirty status computation #1165

jdrouhard · 2016-06-09T22:55:09Z

This adds support for recording an edge's most recent input file's mtime in the build log. This fixes #1162, by giving ninja a better metric for determining whether an edge's outputs are dirty.

If an edge's output files' mtimes are compared to the most recent input's mtime, edges might be calculated as clean even if they are actually dirty. While an edge is running its rule to produce its outputs and an input to the edge is updated before the outputs are written to disk, then subsequent runs will think that the outputs are newer than the inputs, even though the inputs have actually been updated and may be different than what were used to produce those outputs. This can be disastrous if the resulting object files still link, since some of the object files might be out of date.

Instead of comparing output mtimes, ninja will add the most recent input's mtime to the build log file (calculated when the edge first calculates its dirty status) when an edge completes. On subsequent runs, ninja will use this value to compare to the edge's most recent input's mtime to determine whether the outputs are dirty, which is provably more correct and safer, since outputs are guaranteed to actually be up to date.

A test is included for recording the correct input_time in the log file, ensuring that subsequent ninja runs utilize that value correctly.

jdrouhard · 2016-12-01T22:10:48Z

rebased on current origin/master

jdrouhard · 2017-06-16T17:21:11Z

rebased on current origin/master

jdrouhard · 2017-06-16T17:27:47Z

@evmar @nico Is there any way you guys could take a look at this and get it merged in? We're still experiencing the negative effects of ninja without this patch and it's been over a year now since I originally opened the pull request. Thank you for all the hard work you put in to make such a great build tool.

nico · 2017-10-02T17:01:53Z

This adds a whole bunch of machinery for something most people don't seem to need, and that has a workaround that doesn't need ninja changes (see discussion on #1162). So I'm a bit reluctant about merging this.

jdrouhard · 2017-10-02T17:47:03Z

The workaround doesn’t actually work. Wrapping the build commands seems like a good idea initially, but it doesn’t capture implicit dependencies.

This pull request removes an inherent race condition between the time an output starts building and the time it finishes. I don’t think this code is complex enough to not merge, when it actually makes ninja produce more correct builds. Correctness is a good thing, and worth the tiny bit of extra complexity to ensure it.

jseba · 2017-10-09T20:56:47Z

Wrappers are a nightmare. They introduce a layer of indirection that "hopefully" doesn't cause any sort of conflict with the tool. Plus the fact the only good use of a wrapper is to provide a nicer interface around the tool in question to reduce the complexity of the tool in question (see Clang's and GCC's drivers).

Also, as I understand the reason for the change, Ninja right now cannot prove that it's build state is valid. If there is any way for a file to be modified after a job for an output is started, the build state is now indeterminate, as it has no way of detecting that the input files changed between when the job started and when it finished. This changes makes it so that the output's mtime is snapshotted as the job is started, which means that any further changes outside Ninja's control can't leave it in an invalid state.

I would think that a build tool that can't prove that it's in a valid state isn't very useful.

jdrouhard · 2017-10-11T14:04:10Z

@nico I understand your reluctance to merge this. However, I just want to implore that you reconsider the discussion in #1162 and realize that the workarounds mentioned do not work.

If we consider the number one goal of a build system, isn't it to ensure that all builds are correct and repeatable? Without my fix, ninja is fundamentally broken, as you can't truly guarantee that outputs are up to date (try running the simple test case I added without my fix in place). This fix isn't even that invasive--it adds a new field to the log file, but that's about the most complex thing it does.

Please consider merging this early in the release cycle so it can get as much testing as possible. I think we can all agree that a provably correct build system is worth it. I think everyone needs this even if they don't realize it.

neumannsteve · 2017-10-11T20:08:20Z

This would be a very useful change, as a build system that you can't trust is... suboptimal. The change itself appears to be fairly minimal.

jdrouhard · 2017-10-20T21:31:15Z

@nico @evmar I am currently working on updating this pull request to use the new high resolution timestamps that was recently merged.

jdrouhard · 2017-10-22T17:39:54Z

I changed this PR's implementation a bit after going through the changes to have nanosecond timestamp precision. Instead of taking a snapshot of the system time when an edge begins generating its outputs, the log file will now just record the mtime of the most recent input as calculated when ninja is determining whether an edge is dirty.

This is even more correct than my original PR, and dovetails nicely with #1219 since this improvement is based solely on mtimes of actual files (instead of also trying to compare to a snapshot of the current system time). I updated the initial description of the PR to reflect the new implementation and to explain how this fixes the original problem.

…tus computation If an edge's output files' mtimes are compared to the most recent input's mtime, edges might be calculated as clean even if they are actually dirty. While an edge is running its rule to produce its outputs and an input to the edge is updated before the outputs are written to disk, then subsequent runs will think that the outputs are newer than the inputs, even though the inputs have actually been updated and may be different than what were used to produce those outputs. Instead of comparing output mtimes, ninja will add the most recent input's mtime to the build log file when an edge completes if all implicit dependencies were stat'd before the output began to build (all implicit dependencies were already loaded via the deps log file). It looks at both the explicit and implicit dependencies to get the most recent input. On subsequent runs, ninja will use this value to compare to the edge's most recent input's mtime to determine whether the outputs are dirty if a valid mtime was recorded in the log file.

dkrikun · 2017-12-13T18:11:45Z

Hello, any chamce this could be merged soon? Thank you!

jhasse · 2018-11-16T10:37:59Z

Hello, any chamce this could be merged soon?

I will review this soon, but it definitely won't make it into 1.9, sorry.

jdrouhard · 2018-11-16T17:41:53Z

@jhasse Thank you for taking some time to look at it. Before you get too far in reviewing, I'd like to hear your thoughts on the two possible implementations to solve #1162.

This PR currently does not solve the issue 100% of time, as it requires all inputs to already be in the deplog before it will use the input mtimes for dirty status checking. This means that the very first ninja run, or any new input dependencies that are introduced are still subject to the race condition.

The other solution (which was the first approach I took), is to take a system time stamp right when a build begins, and stamp that time in the deplog for each generated output. This makes ninja correct 100% of the time, but might require some various system-dependent mechanisms for getting the most accurate (file-system mtime-equivalent) system time. Linux, Windows, and FreeBSD all have their own syscalls and interfaces for this.

I actually prefer (2) here, but before I modify the PR to do it this way, I'd like to get consensus. (2) will require some additional platform-specific code for getting the current system time, but would make ninja 100% correct.

nico · 2018-11-18T14:10:40Z

FWIW I'd still recommend that you maintain this on your fork as mentioned above. Re "fundamentally broken": That's true for this use case, but that use case isn't common, and it doesn't mean ninja isn't still useful. Make has had the same issue for decades from what I understand. (mtime-based tools are also "fundamentally broken" on NFS shares, but if you build on an NFS share you already have bigger problems, etc.)

jhasse · 2019-04-17T10:24:53Z

That's true for this use case, but that use case isn't common

I use ninja in my IDE. Often, after starting a build, I see an error and fix it while the build is still running. I have to make sure not to save the file while ninja. I think it's a common scenario.

What isn't common though, that this leads to a bug that someone notices. But that makes it even more relevant in my opinion: Hard to reproduce bugs are the worst ;)

I actually prefer (2) here, but before I modify the PR to do it this way, I'd like to get consensus. (2) will require some additional platform-specific code for getting the current system time, but would make ninja 100% correct.

Not sure if that would make it 100% correct, as there's the difference between the system time and the time the filesystem gets updated (i.e. is the filesystem operation atomic? And are all the clocks synchronized?).

jdrouhard · 2019-04-21T17:14:48Z

I actually prefer (2) here, but before I modify the PR to do it this way, I'd like to get consensus. (2) will require some additional platform-specific code for getting the current system time, but would make ninja 100% correct.

Not sure if that would make it 100% correct, as there's the difference between the system time and the time the filesystem gets updated (i.e. is the filesystem operation atomic? And are all the clocks synchronized?).

100% is probably a bit optimistic. Of course it depends on the timestamp granularity of the filesystem being used, and how the filesystem decides to record the mtime when writing files. If we allow ninja to record the most granular timestamp we can glean from the OS as possible when running a command for an output, we can dramatically (and depending on filesystem--completely) remove the race condition that this PR tries to solve. Your point about the difference between system time and time the filesystem gets updated shouldn't matter with my second solution, since the timestamp for the system time (which is recorded in the deplog) is taken before the job is even started, and the only time that matters on the other end is the time the filesystem uses to stamp the mtime attribute.

Scenarios:

Timestamp taken by ninja is more accurate than the filesystem timestamp, so the ninja timestamp appears to be farther in the future than the mtime of the output file:
Ninja produces a false positive (spurious rebuild) the next time the build is run. I prefer this to missing a necessary rebuild.
Filesystem rounds times down (might write a time in the near-past):
Same situation as above. Ninja might produce a false positive and cause a spurious rebuild when not necessary. Still preferred.

jhasse · 2019-04-21T20:01:17Z

Sounds good :)

jdrouhard · 2019-04-23T02:17:29Z

Ok, I'll work on adjusting this PR to go down that route. Stay tuned.

jdrouhard · 2020-03-02T19:18:57Z

@jhasse I plan on getting back to this soon. Life kinda got in the way last summer and this fell down the priority list.

evmar · 2020-03-03T18:37:56Z

Is it a guarantee that the timestamps used by the file system match the system time? I wouldn't naively expect them to be necessarily related -- e.g. stuff like different timestamp resolution, network file systems, delayed writes and so on. I think if you're making a decision based on fs-reported timestamps the correct thing is to use those timestamps as if they were opaque.

(disclaimer: haven't touched ninja in like 8 years at this point)

jdrouhard · 2020-03-03T20:54:11Z

@evmar I've had awhile to think about this problem again. I agree that using a system time to compare against filesystem times is probably not the ideal solution for many reasons, one of which is (as you mention) that we can't reliably use a filesystem-reported mtime since it could be the system time of another machine, or not a time at all.

Avery Pennarun has a great summary of the problems with mtime comparisons for build tools in his blog: https://apenwarr.ca/log/20181113

I think I'm going to close this PR and think about possible ways to incorporate some of redo's ideas into ninja.

evmar · 2020-03-03T23:58:35Z

My take: the best option is to hash the inputs -- not their contents, but rather the names and mtimes of the inputs treated as a blob. Then if anything changes (time forwards or backwards) we rebuild.

jdrouhard · 2020-03-04T16:24:10Z

That's essentially where I'm going with it, but I might hash/track more than the name and mtime of inputs. I'm considering inode number and file size as well.

The issue I'm still trying to resolve is how to do initial dependency discovery. The first time ninja builds an output, it doesn't know all its dependencies yet. The compiler can generate this information (-MMD flag which generates a .d file), but until we have that, we can't hash the input information. This means that the first time we build an output, we have the same issue where one of the inputs might change before the output is written to disk, even though that output was created with now-stale inputs. I believe redo fixes this by doing a separate dependency discovery step before running the rule to create the final output.

I want to make the final solution resilient against all of these types of issues.

jdrouhard · 2020-03-15T20:52:02Z

@jhasse I have another version of this PR that resolves this issue rather cleanly. It can't be perfect (due to the nature of implicit dependency discovery via depfiles), but it gets close. And when used in combination with @bradking's dyndep functionality, it will be 100% correct.

Do you want me to force push this new implementation to this PR, or close this one and create a new one?

jhasse · 2020-03-15T21:22:17Z

Close this one and create a new one :)

jdrouhard force-pushed the log_input_mtime branch from 7b41e18 to 215541b Compare December 1, 2016 22:10

jdrouhard force-pushed the log_input_mtime branch from 215541b to f02ae3f Compare June 16, 2017 17:20

jdrouhard force-pushed the log_input_mtime branch 2 times, most recently from 8b2e64d to b40919b Compare June 23, 2017 15:50

jdrouhard force-pushed the log_input_mtime branch from b40919b to 5d19cb6 Compare October 11, 2017 13:49

jdrouhard force-pushed the log_input_mtime branch from 5d19cb6 to 1a01827 Compare October 22, 2017 17:25

jdrouhard changed the title ~~Record edge's build start time in log file~~ Record most recent input's mtime in log file for correct dirty status computation Oct 22, 2017

jdrouhard force-pushed the log_input_mtime branch 3 times, most recently from 1a01827 to 5ace915 Compare October 22, 2017 21:02

jdrouhard force-pushed the log_input_mtime branch from 5ace915 to 41491a2 Compare October 23, 2017 00:58

nico mentioned this pull request Nov 15, 2018

Race condition when input file is updated while compiling #1497

Closed

jhasse added this to the 1.11.0 milestone Aug 30, 2019

jhasse closed this Mar 15, 2020

jdrouhard mentioned this pull request Mar 15, 2020

Change build log to always log the most recent input mtime #1753

Merged

jhasse removed this from the 1.11.0 milestone Oct 19, 2020

jdrouhard mentioned this pull request Mar 31, 2021

Provide resiliency against inputs changing during the build #1943

Merged

jhasse added this to the 1.12.0 milestone May 14, 2024

Record most recent input's mtime in log file for correct dirty status computation #1165

Record most recent input's mtime in log file for correct dirty status computation #1165

Uh oh!

Conversation

jdrouhard commented Jun 9, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jdrouhard commented Dec 1, 2016

Uh oh!

jdrouhard commented Jun 16, 2017

Uh oh!

jdrouhard commented Jun 16, 2017

Uh oh!

nico commented Oct 2, 2017

Uh oh!

jdrouhard commented Oct 2, 2017

Uh oh!

jseba commented Oct 9, 2017

Uh oh!

jdrouhard commented Oct 11, 2017

Uh oh!

neumannsteve commented Oct 11, 2017

Uh oh!

jdrouhard commented Oct 20, 2017

Uh oh!

jdrouhard commented Oct 22, 2017

Uh oh!

dkrikun commented Dec 13, 2017

Uh oh!

jhasse commented Nov 16, 2018

Uh oh!

jdrouhard commented Nov 16, 2018

Uh oh!

nico commented Nov 18, 2018

Uh oh!

jhasse commented Apr 17, 2019

Uh oh!

jdrouhard commented Apr 21, 2019

Uh oh!

jhasse commented Apr 21, 2019

Uh oh!

jdrouhard commented Apr 23, 2019

Uh oh!

jdrouhard commented Mar 2, 2020

Uh oh!

evmar commented Mar 3, 2020

Uh oh!

jdrouhard commented Mar 3, 2020

Uh oh!

evmar commented Mar 3, 2020

Uh oh!

jdrouhard commented Mar 4, 2020

Uh oh!

jdrouhard commented Mar 15, 2020

Uh oh!

jhasse commented Mar 15, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

jdrouhard commented Jun 9, 2016 •

edited

Loading