KEMBAR78
Implement `git lfs clone` by sinbad · Pull Request #988 · git-lfs/git-lfs · GitHub
Skip to content

Conversation

@sinbad
Copy link
Contributor

@sinbad sinbad commented Feb 9, 2016

Adds a new command git lfs clone which allows for more efficient cloning of LFS repositories by suppressing filters during clone/checkout then doing git lfs pull afterwards. Automatically suppresses error messages generated by this process.

Addresses #931, thanks to @larsxschneider for the extensive analysis, this is 100% based on his findings.

@sinbad
Copy link
Contributor Author

sinbad commented Feb 9, 2016

Well, that's interesting, Travis build failed on Mac (which is what I'm using) but not Linux. I think it's because the version of git in use is 1.9.x ("Apple Git-50" - which is really old, the latest Apple-supplied git is 2.5.4).

What's the oldest version of Git that LFS officially supports? We should probably require that in Travis.

@sinbad
Copy link
Contributor Author

sinbad commented Feb 9, 2016

I've updated the Travis config so it grabs a newer version of git via brew on OS X; this is useful actually since it will now be running the worktree tests and a couple other things that wouldn't run on git 1.9.x. I saw an issue with the cred helper test not being able to be turned off, I've tried to deal with that although it seems to have gone from the latest build.

@technoweenie
Copy link
Contributor

What's the oldest version of Git that LFS officially supports? We should probably require that in Travis.

I think the oldest version is 1.8.x, because of pre-push hook support. Though technically, we can support lower versions then that. They just may want to ensure their host has a pre-receive hook that makes sure LFS files are uploaded before accepting commits with LFS pointers.

If only the clone command is failing, maybe just that command should enforce a newer git version?

@technoweenie technoweenie mentioned this pull request Feb 9, 2016
17 tasks
@technoweenie
Copy link
Contributor

🤘

OLD AND BUSTED:

$ time git clone https://git-server/technoweenie/lfs-pull-bug
Cloning into 'lfs-pull-bug'...
remote: Counting objects: 513, done.
remote: Compressing objects: 100% (508/508), done.
remote: Total 513 (delta 0), reused 513 (delta 0), pack-reused 0
Receiving objects: 100% (513/513), 70.20 KiB | 0 bytes/s, done.
Checking connectivity... done.
Downloading a.bin (5 B)
Checking out files: 100% (506/506), done.
git clone https://git-server/technoweenie/lfs-pull-bug  39.03s user 23.58s system 22% cpu 4:43.24 total

NEW HOTNESS

$ time git lfs clone https://git-server/technoweenie/lfs-pull-bug
Cloning into 'lfs-pull-bug'...
Git LFS: (501 of 501 files) 0 B / 1.85 KB
git lfs clone https://git-server/technoweenie/lfs-pull-bug  6.63s user 15.09s system 93% cpu 23.223 total

Looks like this is leaving out some of the output. Also, this doesn't handle cases where git clone fails:

$ git lfs clone
You must specify a repository to clone.
usage: git clone [<options>] [--] <repo> [<dir>]
    -v, --verbose         be more verbose
    -q, --quiet           be more quiet
    --progress            force progress reporting
    -n, --no-checkout     don't create a checkout
    --bare                create a bare repository
    --mirror              create a mirror repository (implies bare)
    -l, --local           to clone from a local repository
    --no-hardlinks        don't use local hardlinks, always copy
    -s, --shared          setup as shared repository
    --recursive           initialize submodules in the clone
    --recurse-submodules  initialize submodules in the clone
    --template <template-directory>
                          directory from which templates will be used
    --reference <repo>    reference repository
    --dissociate          use --reference only while cloning
    -o, --origin <name>   use <name> instead of 'origin' to track upstream
    -b, --branch <branch>
                          checkout <branch> instead of the remote's HEAD
    -u, --upload-pack <path>
                          path to git-upload-pack on the remote
    --depth <depth>       create a shallow clone of that depth
    --single-branch       clone only one branch, HEAD or --branch
    --separate-git-dir <gitdir>
                          separate git dir from working tree
    -c, --config <key=value>
                          set config inside the new repository
Error(s) during clone
Unable to log panic to : mkdir : no such file or directory

git-lfs/1.1.1 (GitHub; darwin amd64; go 1.5.3; git 7e8fa62)
git version 2.5.4 (Apple Git-61)

$ git-lfs clone
Error(s) during clone

git clone failed: exit status 129
goroutine 1 [running]:
github.com/github/git-lfs/lfs.Stack(0x0, 0x0, 0x0)
    /Users/rick/go/src/github.com/github/git-lfs/lfs/errors.go:557 +0x80
github.com/github/git-lfs/commands.logPanicToWriter(0xd641c0, 0xc820082010, 0xd60000, 0xc820073cc0)
    /Users/rick/go/src/github.com/github/git-lfs/commands/commands.go:184 +0xf7f
github.com/github/git-lfs/commands.logPanic(0xd60000, 0xc820073cc0, 0x0, 0x0)
    /Users/rick/go/src/github.com/github/git-lfs/commands/commands.go:148 +0x421
github.com/github/git-lfs/commands.handlePanic(0xd60000, 0xc820073cc0, 0x0, 0x0)
    /Users/rick/go/src/github.com/github/git-lfs/commands/commands.go:123 +0x4e
github.com/github/git-lfs/commands.LoggedError(0xd60000, 0xc820073cc0, 0x54b6c0, 0x15, 0x0, 0x0, 0x0)
    /Users/rick/go/src/github.com/github/git-lfs/commands/commands.go:73 +0x82
github.com/github/git-lfs/commands.Panic(0xd60000, 0xc820073cc0, 0x54b6c0, 0x15, 0x0, 0x0, 0x0)
    /Users/rick/go/src/github.com/github/git-lfs/commands/commands.go:83 +0x5d
github.com/github/git-lfs/commands.cloneCommand(0x75f000, 0x78a9a0, 0x0, 0x0)
    /Users/rick/go/src/github.com/github/git-lfs/commands/command_clone.go:26 +0xa7
github.com/github/git-lfs/vendor/_nuts/github.com/spf13/cobra.(*Command).execute(0x75f000, 0x78a9a0, 0x0, 0x0, 0x0, 0x0)
    /Users/rick/go/src/github.com/github/git-lfs/vendor/_nuts/github.com/spf13/cobra/command.go:477 +0x403
github.com/github/git-lfs/vendor/_nuts/github.com/spf13/cobra.(*Command).Execute(0x762260, 0x0, 0x0)
    /Users/rick/go/src/github.com/github/git-lfs/vendor/_nuts/github.com/spf13/cobra/command.go:551 +0x46a
github.com/github/git-lfs/commands.Run()
    /Users/rick/go/src/github.com/github/git-lfs/commands/commands.go:88 +0x23
main.main()
    /Users/rick/go/src/github.com/github/git-lfs/git-lfs.go:34 +0x12e

ENV:
LocalWorkingDir=
LocalGitDir=
LocalGitStorageDir=
LocalMediaDir=
TempDir=/var/folders/r6/qpmbzj194fj2ljwylsdc07hr0000gn/T/git-lfs
ConcurrentTransfers=3
BatchTransfer=true
GIT_LFS_TEST_DIR=/Users/rick/p/git-lfs-temp
GIT_LFS_TEST_MAXPROCS=4

Added this to Git LFS v1.2:

@larsxschneider
Copy link
Member

@sinbad Thanks for the credit 😄

I recently updated Git on the Linux build, too:
#980

@technoweenie
Maybe we should add a Travis Matrix build. That could run the tests for different Git versions. See here for an example that compiles the source with different compiler versions:
https://github.com/genbattle/dkm/blob/29067b6e257016e3fbf9afe2773bada2fc333212/.travis.yml#L6-L41

.travis.yml Outdated
email: false
before_install:
- if [[ "$TRAVIS_OS_NAME" == "osx" ]]; then brew update ; fi
- if [[ "$TRAVIS_OS_NAME" == "osx" ]]; then brew install git ; fi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you want to use multi line you can do it like this:
https://github.com/git/git/blob/master/.travis.yml#L37-L69

@technoweenie
Copy link
Contributor

Maybe we should add a Travis Matrix build.

Yes, I was going to suggest that in your travis PR actually.

git/git.go Outdated
strings.Contains(s, "error: cannot run : No such file or directory") ||
strings.Contains(s, "warning: Clone succeeded, but checkout failed") ||
strings.Contains(s, "You can inspect what was checked out with 'git status'") ||
strings.Contains(s, "retry the checkout") ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should filter for multi line message. Because individually these error messages could be legit in a different context.

E.g. here I tried to match on the clearly wrong error: external filter failed and then delete the lines around that:
https://gist.github.com/larsxschneider/85652462dcb442cc9344#file-git-lfsclone-sh-L7

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, maybe, I wasn't keen on assuming the number of lines surrounding an error. Maybe rather than trying to be super-precise about parsing sequences of error text (which might change after all), perhaps it's best just to suppress all the known errors like this in the output but to write them all to the git-lfs logs. If git clone fails it'll stop with a non-zero anyway and we could just suggest running 'git lfs logs last' to get the full story. This is the same pattern we use elsewhere.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

git-lfs logs sounds good! Plus it won't be a problem for future git versions anyways... the patch to suppress the errors just landed in Git master 🎉 git/git@1a8630d

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the patch to suppress the errors just landed in Git master 🎉 git/git@1a8630d

W00t, congrats! 🤘

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in the end I sent the filtered stderr output to the trace so it's visible when you use GIT_TRACE=1 and it appears in context.

@larsxschneider
Copy link
Member

Maybe we should add a Travis Matrix build.

Yes, I was going to suggest that in your travis PR actually.


Reply to this email directly or view it on GitHub.

here you go #990 :-)

@sinbad
Copy link
Contributor Author

sinbad commented Feb 10, 2016

If only the clone command is failing, maybe just that command should enforce a newer git version?

Yeah, I was hoping our min version was higher :) To find exactly which git version started tolerating this approach, either I'm going to have to binary search git versions until I find which one it breaks at, or I need to find the error in the git source & see when it changed. Kinda tedious so I didn't want to do it unnecessarily, but I'll sort that out.

Looks like this is leaving out some of the output.

I think this might be because the stdin isn't attached & git doesn't think it's interactive so dials back the output. I'll try connecting stdin to see if that works.

Also, this doesn't handle cases where git clone fails:

I expect nothing but perfection!!11 ;) Oops, will fix.

@sinbad
Copy link
Contributor Author

sinbad commented Feb 12, 2016

Looks like this is leaving out some of the output.

I think this might be because the stdin isn't attached & git doesn't think it's interactive so dials back the output. I'll try connecting stdin to see if that works.

Turns out I was almost right, it's that the progress output is suppressed by Git when it detects a non-TTY, but unfortunately just attaching stdin isn't enough. To get this output I think I'd have to call clone via a pseudo-TTY implementation like https://github.com/kr/pty - I'll play with it a bit but it may be overkill. Other than that detail I think this is ready.

[edit]For future reference I did try reading the raw stderr (definitely where the progress is sent per git source) instead of using buffered I/O it in case that helped, but it didn't. The only thing that worked was assigning os.Stderr (a real TTY), or changing the git source to force the progress output to be sent to stderr in all cases regardless.

@sinbad
Copy link
Contributor Author

sinbad commented Feb 12, 2016

I found we already had a dependency on kr/pty via kr/pretty so I've used it to provide a pseudo-tty to git clone, output is now complete although still slightly delayed compared to direct on terminal. Better than before though.

git/git.go Outdated
// We can only support this on git version 2.2.0+
// Before that, setting filters to blank would break clone
if !Config.IsGitVersionAtLeast("2.2.0") {
return errors.New("git lfs clone requires git version 2.2.0+")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't you set the smudge filter to cat in this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! Done.

@sinbad
Copy link
Contributor Author

sinbad commented Feb 22, 2016

Any chance of a merge on this one now @technoweenie? :)

@technoweenie
Copy link
Contributor

I was planning on a quick 1.1.2 release for some bug fixes. I think this would be more appropriate in 1.2. Maybe we can just cut 1.2's scope and ship that instead though :)

@sinbad
Copy link
Contributor Author

sinbad commented Feb 22, 2016

Maybe create a separate maintenance branch for 1.1.x? Then we could start pulling future 1.2 stuff together in master.

@technoweenie
Copy link
Contributor

Ah yeah, we even have a backport script for that :)

technoweenie added a commit that referenced this pull request Feb 22, 2016
@technoweenie technoweenie merged commit e17f239 into git-lfs:master Feb 22, 2016
@sinbad sinbad deleted the git-lfs-clone branch February 23, 2016 09:18
@rassilon
Copy link

Here's another vote for having a release with this fix in it. I ended up building the Windows installer locally of the current master contents at the time.

Thanks!

@technoweenie
Copy link
Contributor

@@ -0,0 +1,70 @@
package commands

import (

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nguyenvietson

0day-ci pushed a commit to 0day-ci/git that referenced this pull request Jun 27, 2016
Hi,

I found a way to make Git LFS faster up to a factor of 100x in
repositories with a large number of Git LFS files. I am looking
for comments if my approach would be acceptable by the Git community.

## What is Git LFS?
Git LFS [1] is an extension to Git that handles large files for Git
repositories. The project gained quite some momentum as almost all major
Git hosting services support it (GitHub [1], Atlassian Bitbucket [2],
GitLab [4]).

## What is the problem with Git LFS?
Git LFS is an application that is executed via Git clean/smudge filter.
The process invocation of these filters requires noticeable time (especially
on Windows) even if the Git LFS executable only accesses its local cache.

Based on my previous findings [5] Steve Streeting (@sinbad) improved the
clone times of Git LFS repositories with a lot of files by a factor of 10
or more [6][7].

Unfortunately that fix helps only with cloning. Any local Git operation
that invokes the clean/smudge filter (e.g. switching branches) is still
slow. Even on the Git mailing list a user reported that issue [8].

## Proposed solution
Git LFS caches its objects under .git/lfs/objects. Most of the time Git
LFS objects are already available in the cache (e.g. if you switch branches
back and forth). I implemented these "cache hits" natively in Git.
Please note that this implementation is just a quick and dirty proof of
concept. If the Git community agrees that this kind of approach would be
acceptable then I will start to work on a proper patch series with cross
platform support and unit tests.

## Performance tests
I executed both test runs on a 2,5 GHz Intel Core i7 with SSD and OS X.
A test run is the consecutive execution of four Git commands:
 1. clone the repo
 2. checkout to the "removed-files" branch
 3. timed: checkout the "master" branch
 4. timed: checkout "removed-files" branch

Test command:
set -x; git lfs clone https://github.com/larsxschneider/lfstest-manyfiles.git repo; cd repo; git checkout removed-files; time git checkout master; time git checkout removed-files

I compiled Git with the following flags:
NO_GETTEXT=YesPlease NEEDS_SSL_WITH_CRYPTO=YesPlease make -j 8 CFLAGS="-I/usr/local/opt/openssl/include" LDFLAGS="-L/usr/local/opt/openssl/lib"

### TEST RUN A -- Default Git 2.9 (ab7797d) and Git LFS 1.2.1
+ git lfs clone https://github.com/larsxschneider/lfstest-manyfiles.git repo
Cloning into 'repo'...
warning: templates not found /Users/lars/share/git-core/templates
remote: Counting objects: 15012, done.
remote: Total 15012 (delta 0), reused 0 (delta 0), pack-reused 15012
Receiving objects: 100% (15012/15012), 2.02 MiB | 1.77 MiB/s, done.
Checking connectivity... done.
Checking out files: 100% (15001/15001), done.
Git LFS: (15000 of 15000 files) 0 B / 77.04 KB
+ cd repo
+ git checkout removed-files
Branch removed-files set up to track remote branch removed-files from origin.
Switched to a new branch 'removed-files'
+ git checkout master
Checking out files: 100% (12000/12000), done.
Switched to branch 'master'
Your branch is up-to-date with 'origin/master'.

real    6m2.979s
user    2m39.066s
sys 2m41.610s
+ git checkout removed-files
Switched to branch 'removed-files'
Your branch is up-to-date with 'origin/removed-files'.

real    0m1.310s
user    0m0.385s
sys 0m0.881s

### TEST RUN B -- Default Git 2.9 with native LFS cache and Git LFS 1.2.1
https://github.com/larsxschneider/git/tree/lfs-cache
+ git lfs clone https://github.com/larsxschneider/lfstest-manyfiles.git repo
Cloning into 'repo'...
warning: templates not found /Users/lars/share/git-core/templates
remote: Counting objects: 15012, done.
remote: Total 15012 (delta 0), reused 0 (delta 0), pack-reused 15012
Receiving objects: 100% (15012/15012), 2.02 MiB | 1.44 MiB/s, done.
Checking connectivity... done.
Git LFS: (15001 of 15000 files) 0 B / 77.04 KB
+ cd repo
+ git checkout removed-files
Branch removed-files set up to track remote branch removed-files from origin.
Switched to a new branch 'removed-files'
+ git checkout master
Checking out files: 100% (12000/12000), done.
Switched to branch 'master'
Your branch is up-to-date with 'origin/master'.

real    0m2.267s
user    0m0.295s
sys 0m1.948s
+ git checkout removed-files
Switched to branch 'removed-files'
Your branch is up-to-date with 'origin/removed-files'.

real    0m0.715s
user    0m0.072s
sys 0m0.672s

### Results
Default Git:                      6m2.979s + 0m1.310s = 364s
Git with native LFS cache access: 0m2.267s + 0m0.715s = 4s

The native cache solution is almost 100x faster when switching branches
on my local machine with a test repository containing 15,000 Git LFS files.
Based on my previous experience with Git LFS clone I expect even more
dramatic results on Windows.

Thanks,
Lars

[1] https://git-lfs.github.com/
[2] https://github.com/blog/1986-announcing-git-large-file-storage-lfs
[3] http://blogs.atlassian.com/2016/02/git-lfs-for-designers-game-developers-architects/
[4] https://about.gitlab.com/2015/11/23/announcing-git-lfs-support-in-gitlab/
[5] git-lfs/git-lfs#931 (comment)
[6] git-lfs/git-lfs#988
[7] https://developer.atlassian.com/blog/2016/04/git-lfs-12-clone-faster/
[8] http://article.gmane.org/gmane.comp.version-control.git/297809
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants