KEMBAR78
Fix for #1860: Closing stdout pipe before function return by monitorjbl · Pull Request #1861 · git-lfs/git-lfs · GitHub
Skip to content

Conversation

monitorjbl
Copy link
Contributor

No description provided.

Copy link
Contributor

@ttaylorr ttaylorr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be safe since the body of the function consumes the reader until scanner.Scan() returns false (meaning the pipe is closed, and the command has already terminated). In the early-exit case, this is safe as well since we don't care about the remainder of the data.

Going to hold off on giving an explicit approval on this since @technoweenie worked on this recently.

@technoweenie
Copy link
Contributor

This looks good, but now I'd really like some tests for it :) We should be able to come up with a prune test to trigger #1860.

@monitorjbl: If you want to write a test, that'd be awesome. If not, we'll write one and then merge this. However, I do ask that you sign our CLA :)

@monitorjbl
Copy link
Contributor Author

@technoweenie I'll give writing a test a shot. I'm gonna have to figure out how to generate "old" commits in a repo if you want an integration test like this though.

I did sign the CLA, I'm not sure why the check didn't update. When I click on it again, it says I've already signed it:

image

@technoweenie
Copy link
Contributor

The prune test uses lfstest-testutils to create commits from a custom json document. You can create old commits like this.

Are you logging into the cla app as a different github user? That's the only thing I can think of. I'll see if I can peek at the server too.

@monitorjbl
Copy link
Contributor Author

I was on the right github account, but I made the commit on my work laptop so the email address in the commit history was wrong. Looks like changing that did the trick!

@monitorjbl
Copy link
Contributor Author

@technoweenie It doesn't look like your testing library actually sets the dates on the commits. I ran the pre-existing tests in test/prune-verify.sh with the following options:

KEEPTRASH=1 GIT_LFS_TEST_DIR=/tmp/git-lfs-tests ./test/test-prune.sh 

One of the tests in here ("prune verify") looks like its supposed to be making commits that are up to 50 days old. I verified that it outputs JSON that looks like this:

[
  {
    "CommitDate":"2016-11-27T15:48:28Z",
    "Files":[
      {"Filename":"file.dat","Size":28, "Data":"Content for commit 1 (prune)"}]
  },
  {
    "CommitDate":"2016-12-07T15:48:28Z",
    "Files":[
      {"Filename":"file.dat","Size":42, "Data":"Content for commit 2 (prune - fail verify)"}]
  },
  {
    "CommitDate":"2016-12-12T15:48:28Z",
    "Files":[
      {"Filename":"file.dat","Size":28, "Data":"Content for commit 3 (prune)"}]
  },
  {
    "CommitDate":"2016-12-22T15:48:28Z",
    "Files":[
      {"Filename":"file.dat","Size":12, "Data":"HEAD content"}]
  }
]

So, I went to the repo that was generated for one of the tests and looked at the real git log:

$ cd /tmp/git-lfs-tests/test-prune.sh-74530/clone_prune_verify
$ git log

commit 284e764360c0b538a723eb2af6f82e0de51744bb
Author: Git LFS Tests <git-lfs@example.com>
Date:   Mon Jan 16 10:41:10 2017 -0500

    Test commit 3

commit fee71f23920ff0a1ed402ed7f27afe68e45266d0
Author: Git LFS Tests <git-lfs@example.com>
Date:   Mon Jan 16 10:41:10 2017 -0500

    Test commit 2

commit ec75c6e3984033d58c63e1d7a34e5937784ce5de
Author: Git LFS Tests <git-lfs@example.com>
Date:   Mon Jan 16 10:41:10 2017 -0500

    Test commit 1

commit 0459cbd1dc89c3048e65f49cd7469e080858974d
Author: Git LFS Tests <git-lfs@example.com>
Date:   Mon Jan 16 10:41:10 2017 -0500

    Test commit 0

The dates are all the current date. I'm running this on OSX, but it looks like you have a bit of shell script in testhelpers.sh to handle this case.

I'd originally assumed that bufio.Scanner was exhausting the stdout pipe on small repos even if the read loop broke early. That may still be the case, but I wonder if the commit dates not actually being set is why this issue was not found before?

@monitorjbl
Copy link
Contributor Author

monitorjbl commented Jan 16, 2017

Well, scratch that @technoweenie . TIL about GIT_AUTHOR_DATE and GIT_COMMITTER_DATE. Looks like the for-each-ref call that gets made only looks at the commit date. Still, it would be nice if the commits actually looked correct from git log. If you don't mind, I'd like to add the following change to test/testutils.go as part of this PR:

diff --git a/test/testutils.go b/test/testutils.go
index c2bb3e21..bfae1260 100644
--- a/test/testutils.go
+++ b/test/testutils.go
@@ -298,8 +298,10 @@ func commitAtDate(atDate time.Time, committerName, committerEmail, msg string) e
        // set GIT_COMMITTER_DATE environment var e.g. "Fri Jun 21 20:26:41 2013 +0900"
        if atDate.IsZero() {
                env = append(env, "GIT_COMMITTER_DATE=")
+               env = append(env, "GIT_AUTHOR_DATE=")
        } else {
                env = append(env, fmt.Sprintf("GIT_COMMITTER_DATE=%v", git.FormatGitDate(atDate)))
+               env = append(env, fmt.Sprintf("GIT_AUTHOR_DATE=%v", git.FormatGitDate(atDate)))
        }
        cmd.Env = env
        out, err := cmd.CombinedOutput()

Copy link
Contributor

@technoweenie technoweenie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great work, thanks @monitorjbl 🤘 Nice "git author" catch in the test utils.

Copy link
Contributor

@technoweenie technoweenie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a stab at this. I started leaving review comments, and then noticed something in the (*exec.Cmd) Wait() error func:

c.closeDescriptors(c.closeAfterWait)

If you look in that exec.go file, you can search for closeAfterWait, and see how it adds pipes to that slice, and then closes them after Start() and Wait(). So, it appears that this entire PR isn't even needed. However, the test you wrote in this PR definitely hangs (or at least takes way longer than I'm willing to let a test run :feelsgood:).

My review feedback should still be valid. However, I want to try to dig further into this to understand why this PR is even needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we name this just Cmd? So *subprocess.Cmd is a wrapper around *exec.Cmd.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of adding the *exec.Cmd as an private field, you can simply embed it in this type.

package subprocess

type Cmd struct {
  *exec.Cmd
}

When it has no field name, it means this new struct is embedding the *exec.Cmd. Embedding means any public fields and functions on *exec.Cmd are also on *subprocess.Cmd.

cmd := &subprocess.Cmd{Cmd: &exec.Command("name", "arg")}
cmd.Start() // go treats this as `cmd.Cmd.Start()`

This should let you get rid of those other public fields like Path, Args, Env, etc.

The other thing about embedding is it lets you override the functions.

func (c *Cmd) Start() error {
  fmt.Println("Starting")
  err := c.Cmd.Start()
  fmt.Println("started", err)
  return err
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually tried this the first time since I'd read about Go's inheritance-like ability, but I got some pointer errors because I didn't realize you had to initialize the child struct like this. Thanks for the tip, I got it to work!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spoke a bit too soon. I need to be able to override the *Pipe() functions and call back to the original exec.Cmd function to do so. Fortunately, I can still include a reference to the original exec.Cmd struct while embedding. Cuts the code down quite a bit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, that was a rollercoaster of Go education. I think I got it working the way you originally suggested now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the only *exec.Cmd function that needs to be overwritten. Both Output() and CombinedOutput() call Run(), and Run() calls Wait().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the *Pipe() functions need to be overridden as well. There's no way to get ahold of the pipes created by exec.Cmd's functions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of tracking just one of each type, why not just track it as a slice of io.Closer? Functions like StdoutPipe() can just append the pipe to the slice, and Wait()can just iterate through the slice, callingClose()` on each item.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do!

@monitorjbl
Copy link
Contributor Author

I would guess the hang happens at line 434 before the c.closeDescriptors(c.closeAfterWait) gets called. The process can't complete while the pipe still has data.

I'll get these fixes in, thanks for the pointers! Still somewhat new to Go, so any advice is appreciated.

@technoweenie
Copy link
Contributor

I would guess the hang happens at line 434 before the c.closeDescriptors(c.closeAfterWait) gets called.

Great catch 🤘 Ok carry on!

@monitorjbl monitorjbl force-pushed the master branch 2 times, most recently from ca5abc8 to 318efd4 Compare January 19, 2017 14:37
Created subprocess.Cmd wrapper to autoclose any created pipes
Removing original fix for git-lfs#1860
@technoweenie
Copy link
Contributor

Nice, this looks much better 🤘 Thanks for tackling this. Hope your first Go experience was a good one :)

chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Jul 15, 2024
In previous commits in this PR we introduced a new utility command
for our test suite, lfstest-genrandom, which reliably produces
random data on all the platforms we support (in several forms of
base64 encoding or as raw binary data), and then converted much of
our test suite to make use of this new command.

As noted in one of those commits, one test in our t/t-prune.sh test
script makes use of the "uuidgen" command to populate some Git LFS
objects with arbitrary text, but as this command does not exist in
the Git Bash environment on Windows, the blobs actually have empty
data in this case.  The test still passes because it doesn't depend
on the length of the random data, and the error exit code from the
missing "uuidgen" command is ignored.

We therefore revise this "prune verify large numbers of refs" test
(which was added in commit dd19c1b of
PR git-lfs#1861) to call our new lfstest-genrandom command instead of the
"uuidgen" command.  We use the --base64 flag and a slightly larger size
for the output base64-encoded string than the length of a UUID.

We also adjust the test so that it does not create the Git LFS objects
with sizes that do not match the length of their arbitrary content.
(Again, the test passed despite this inconsistency in its test data.)
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Aug 24, 2024
In previous commits in this PR we introduced a new utility program
for our test suite, lfstest-genrandom, which reliably produces
random data on all the platforms we support (in several forms of
base64 encoding or as raw binary data), and then converted much of
our test suite to make use of this new program.

As noted in one of those commits, one test in our t/t-prune.sh test
script makes use of the "uuidgen" command to populate some Git LFS
objects with arbitrary text, but as this command does not exist in
the Git Bash environment on Windows, the blobs actually have empty
data in this case.  The test still passes because it doesn't depend
on the length of the random data, and the error exit code from the
missing "uuidgen" command is ignored.

We therefore revise this "prune verify large numbers of refs" test
(which was added in commit dd19c1b of
PR git-lfs#1861) to call our new lfstest-genrandom program instead of the
"uuidgen" command.  We use the --base64 flag and a slightly larger size
for the output base64-encoded string than the length of a UUID.

We also adjust the test so that it does not create the Git LFS objects
with sizes that do not match the length of their arbitrary content.
(Again, the test passed despite this inconsistency in its test data.)
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Aug 30, 2024
In previous commits in this PR we introduced a new utility program
for our test suite, lfstest-genrandom, which reliably produces
random data on all the platforms we support (in several forms of
base64 encoding or as raw binary data), and then converted much of
our test suite to make use of this new program.

As noted in one of those commits, one test in our t/t-prune.sh test
script makes use of the "uuidgen" command to populate some Git LFS
objects with arbitrary text, but as this command does not exist in
the Git Bash environment on Windows, the blobs actually have empty
data in this case.  The test still passes because it doesn't depend
on the length of the random data, and the error exit code from the
missing "uuidgen" command is ignored.

We therefore revise this "prune verify large numbers of refs" test
(which was added in commit dd19c1b of
PR git-lfs#1861) to call our new lfstest-genrandom program instead of the
"uuidgen" command.  We use the --base64 flag and a slightly larger size
for the output base64-encoded string than the length of a UUID.

We also adjust the test so that it does not create the Git LFS objects
with sizes that do not match the length of their arbitrary content.
(Again, the test passed despite this inconsistency in its test data.)
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Sep 18, 2024
In previous commits in this PR we introduced a new utility program
for our test suite, lfstest-genrandom, which reliably produces
random data on all the platforms we support (in several forms of
base64 encoding or as raw binary data), and then converted much of
our test suite to make use of this new program.

As noted in one of those commits, one test in our t/t-prune.sh test
script makes use of the "uuidgen" command to populate some Git LFS
objects with arbitrary text, but as this command does not exist in
the Git Bash environment on Windows, the blobs actually have empty
data in this case.  The test still passes because it doesn't depend
on the length of the random data, and the error exit code from the
missing "uuidgen" command is ignored.

We therefore revise this "prune verify large numbers of refs" test
(which was added in commit dd19c1b of
PR git-lfs#1861) to call our new lfstest-genrandom program instead of the
"uuidgen" command.  We use the --base64 flag and a slightly larger size
for the output base64-encoded string than the length of a UUID.

We also adjust the test so that it does not create the Git LFS objects
with sizes that do not match the length of their arbitrary content.
(Again, the test passed despite this inconsistency in its test data.)
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Sep 18, 2024
In previous commits in this PR we introduced a new utility program
for our test suite, lfstest-genrandom, which reliably produces random
data on all the platforms we support (in several forms of Base64
encoding, or as raw binary data), and then converted much of our test
suite to make use of this new program.

As noted in one of those commits, one test in our t/t-prune.sh test
script makes use of the uuidgen(1) command to populate some Git LFS
objects with arbitrary text, but as this command does not exist in
the Git Bash environment on Windows, the objects actually have empty
data on that platform.  The test still passes because it doesn't depend
on the length of the random data, and because the non-zero exit code
caused by the missing uuidgen(1) command is ignored as it appears earlier
in a pipeline whose final command (our lfstest-testutils program)
always succeeds.

We therefore revise this "prune verify large numbers of refs" test
(which was added in commit dd19c1b of
PR git-lfs#1861) to call our new lfstest-genrandom program instead of the
uuidgen(1) command.  We pass the --base64 flag to our program and request
a slightly larger size for the output Base64-encoded string than the
length of a UUID, just to ensure we are using a similar amount of random
data as before, although this doesn't actually affect the success or
failure of the test.

We also adjust the test so that it does not create the Git LFS objects
with sizes that do not match the length of their arbitrary content.
(Again, the test passed despite this inconsistency in its test data.)
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Sep 19, 2024
In previous commits in this PR we introduced a new utility program
for our test suite, lfstest-genrandom, which reliably produces random
data on all the platforms we support (in several forms of Base64
encoding, or as raw binary data), and then converted much of our test
suite to make use of this new program.

As noted in one of those commits, one test in our t/t-prune.sh test
script makes use of the uuidgen(1) command to populate some Git LFS
objects with arbitrary text, but as this command does not exist in
the Git Bash environment on Windows, the objects actually have empty
data on that platform.  The test still passes because it doesn't depend
on the length of the random data, and because the non-zero exit code
caused by the missing uuidgen(1) command is ignored as it appears earlier
in a pipeline whose final command (our lfstest-testutils program)
always succeeds.

We therefore revise this "prune verify large numbers of refs" test
(which was added in commit dd19c1b of
PR git-lfs#1861) to call our new lfstest-genrandom program instead of the
uuidgen(1) command.  We pass the --base64 flag to our program and request
a slightly larger size for the output Base64-encoded string than the
length of a UUID, just to ensure we are using a similar amount of random
data as before, although this doesn't actually affect the success or
failure of the test.

We also adjust the test so that it does not create the Git LFS objects
with sizes that do not match the length of their arbitrary content.
(Again, the test passed despite this inconsistency in its test data.)
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Sep 19, 2024
In previous commits in this PR we introduced a new utility program
for our test suite, lfstest-genrandom, which reliably produces random
data on all the platforms we support (in several forms of Base64
encoding, or as raw binary data), and then converted much of our test
suite to make use of this new program.

As noted in one of those commits, one test in our t/t-prune.sh test
script makes use of the uuidgen(1) command to populate some Git LFS
objects with arbitrary text, but as this command does not exist in
the Git Bash environment on Windows, the objects actually have empty
data on that platform.  The test still passes because it doesn't depend
on the length of the random data, and because the non-zero exit code
caused by the missing uuidgen(1) command is ignored as it appears earlier
in a pipeline whose final command (our lfstest-testutils program)
always succeeds.

We therefore revise this "prune verify large numbers of refs" test
(which was added in commit dd19c1b of
PR git-lfs#1861) to call our new lfstest-genrandom program instead of the
uuidgen(1) command.  We pass the --base64 flag to our program and request
a slightly larger size for the output Base64-encoded string than the
length of a UUID, just to ensure we are using a similar amount of random
data as before, although this doesn't actually affect the success or
failure of the test.

We also adjust the test so that it does not create the Git LFS objects
with sizes that do not match the length of their arbitrary content.
(Again, the test passed despite this inconsistency in its test data.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants