-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Implement git lfs clone
#988
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Well, that's interesting, Travis build failed on Mac (which is what I'm using) but not Linux. I think it's because the version of git in use is 1.9.x ("Apple Git-50" - which is really old, the latest Apple-supplied git is 2.5.4). What's the oldest version of Git that LFS officially supports? We should probably require that in Travis. |
|
I've updated the Travis config so it grabs a newer version of git via brew on OS X; this is useful actually since it will now be running the worktree tests and a couple other things that wouldn't run on git 1.9.x. I saw an issue with the cred helper test not being able to be turned off, I've tried to deal with that although it seems to have gone from the latest build. |
I think the oldest version is 1.8.x, because of If only the clone command is failing, maybe just that command should enforce a newer git version? |
|
🤘 OLD AND BUSTED: $ time git clone https://git-server/technoweenie/lfs-pull-bug
Cloning into 'lfs-pull-bug'...
remote: Counting objects: 513, done.
remote: Compressing objects: 100% (508/508), done.
remote: Total 513 (delta 0), reused 513 (delta 0), pack-reused 0
Receiving objects: 100% (513/513), 70.20 KiB | 0 bytes/s, done.
Checking connectivity... done.
Downloading a.bin (5 B)
Checking out files: 100% (506/506), done.
git clone https://git-server/technoweenie/lfs-pull-bug 39.03s user 23.58s system 22% cpu 4:43.24 totalNEW HOTNESS $ time git lfs clone https://git-server/technoweenie/lfs-pull-bug
Cloning into 'lfs-pull-bug'...
Git LFS: (501 of 501 files) 0 B / 1.85 KB
git lfs clone https://git-server/technoweenie/lfs-pull-bug 6.63s user 15.09s system 93% cpu 23.223 totalLooks like this is leaving out some of the output. Also, this doesn't handle cases where $ git lfs clone
You must specify a repository to clone.
usage: git clone [<options>] [--] <repo> [<dir>]
-v, --verbose be more verbose
-q, --quiet be more quiet
--progress force progress reporting
-n, --no-checkout don't create a checkout
--bare create a bare repository
--mirror create a mirror repository (implies bare)
-l, --local to clone from a local repository
--no-hardlinks don't use local hardlinks, always copy
-s, --shared setup as shared repository
--recursive initialize submodules in the clone
--recurse-submodules initialize submodules in the clone
--template <template-directory>
directory from which templates will be used
--reference <repo> reference repository
--dissociate use --reference only while cloning
-o, --origin <name> use <name> instead of 'origin' to track upstream
-b, --branch <branch>
checkout <branch> instead of the remote's HEAD
-u, --upload-pack <path>
path to git-upload-pack on the remote
--depth <depth> create a shallow clone of that depth
--single-branch clone only one branch, HEAD or --branch
--separate-git-dir <gitdir>
separate git dir from working tree
-c, --config <key=value>
set config inside the new repository
Error(s) during clone
Unable to log panic to : mkdir : no such file or directory
git-lfs/1.1.1 (GitHub; darwin amd64; go 1.5.3; git 7e8fa62)
git version 2.5.4 (Apple Git-61)
$ git-lfs clone
Error(s) during clone
git clone failed: exit status 129
goroutine 1 [running]:
github.com/github/git-lfs/lfs.Stack(0x0, 0x0, 0x0)
/Users/rick/go/src/github.com/github/git-lfs/lfs/errors.go:557 +0x80
github.com/github/git-lfs/commands.logPanicToWriter(0xd641c0, 0xc820082010, 0xd60000, 0xc820073cc0)
/Users/rick/go/src/github.com/github/git-lfs/commands/commands.go:184 +0xf7f
github.com/github/git-lfs/commands.logPanic(0xd60000, 0xc820073cc0, 0x0, 0x0)
/Users/rick/go/src/github.com/github/git-lfs/commands/commands.go:148 +0x421
github.com/github/git-lfs/commands.handlePanic(0xd60000, 0xc820073cc0, 0x0, 0x0)
/Users/rick/go/src/github.com/github/git-lfs/commands/commands.go:123 +0x4e
github.com/github/git-lfs/commands.LoggedError(0xd60000, 0xc820073cc0, 0x54b6c0, 0x15, 0x0, 0x0, 0x0)
/Users/rick/go/src/github.com/github/git-lfs/commands/commands.go:73 +0x82
github.com/github/git-lfs/commands.Panic(0xd60000, 0xc820073cc0, 0x54b6c0, 0x15, 0x0, 0x0, 0x0)
/Users/rick/go/src/github.com/github/git-lfs/commands/commands.go:83 +0x5d
github.com/github/git-lfs/commands.cloneCommand(0x75f000, 0x78a9a0, 0x0, 0x0)
/Users/rick/go/src/github.com/github/git-lfs/commands/command_clone.go:26 +0xa7
github.com/github/git-lfs/vendor/_nuts/github.com/spf13/cobra.(*Command).execute(0x75f000, 0x78a9a0, 0x0, 0x0, 0x0, 0x0)
/Users/rick/go/src/github.com/github/git-lfs/vendor/_nuts/github.com/spf13/cobra/command.go:477 +0x403
github.com/github/git-lfs/vendor/_nuts/github.com/spf13/cobra.(*Command).Execute(0x762260, 0x0, 0x0)
/Users/rick/go/src/github.com/github/git-lfs/vendor/_nuts/github.com/spf13/cobra/command.go:551 +0x46a
github.com/github/git-lfs/commands.Run()
/Users/rick/go/src/github.com/github/git-lfs/commands/commands.go:88 +0x23
main.main()
/Users/rick/go/src/github.com/github/git-lfs/git-lfs.go:34 +0x12e
ENV:
LocalWorkingDir=
LocalGitDir=
LocalGitStorageDir=
LocalMediaDir=
TempDir=/var/folders/r6/qpmbzj194fj2ljwylsdc07hr0000gn/T/git-lfs
ConcurrentTransfers=3
BatchTransfer=true
GIT_LFS_TEST_DIR=/Users/rick/p/git-lfs-temp
GIT_LFS_TEST_MAXPROCS=4Added this to Git LFS v1.2: |
|
@sinbad Thanks for the credit 😄 I recently updated Git on the Linux build, too: @technoweenie |
.travis.yml
Outdated
| email: false | ||
| before_install: | ||
| - if [[ "$TRAVIS_OS_NAME" == "osx" ]]; then brew update ; fi | ||
| - if [[ "$TRAVIS_OS_NAME" == "osx" ]]; then brew install git ; fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you want to use multi line you can do it like this:
https://github.com/git/git/blob/master/.travis.yml#L37-L69
Yes, I was going to suggest that in your travis PR actually. |
git/git.go
Outdated
| strings.Contains(s, "error: cannot run : No such file or directory") || | ||
| strings.Contains(s, "warning: Clone succeeded, but checkout failed") || | ||
| strings.Contains(s, "You can inspect what was checked out with 'git status'") || | ||
| strings.Contains(s, "retry the checkout") || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should filter for multi line message. Because individually these error messages could be legit in a different context.
E.g. here I tried to match on the clearly wrong error: external filter failed and then delete the lines around that:
https://gist.github.com/larsxschneider/85652462dcb442cc9344#file-git-lfsclone-sh-L7
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, maybe, I wasn't keen on assuming the number of lines surrounding an error. Maybe rather than trying to be super-precise about parsing sequences of error text (which might change after all), perhaps it's best just to suppress all the known errors like this in the output but to write them all to the git-lfs logs. If git clone fails it'll stop with a non-zero anyway and we could just suggest running 'git lfs logs last' to get the full story. This is the same pattern we use elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
git-lfs logs sounds good! Plus it won't be a problem for future git versions anyways... the patch to suppress the errors just landed in Git master 🎉 git/git@1a8630d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the patch to suppress the errors just landed in Git master 🎉 git/git@1a8630d
W00t, congrats! 🤘
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in the end I sent the filtered stderr output to the trace so it's visible when you use GIT_TRACE=1 and it appears in context.
|
Yeah, I was hoping our min version was higher :) To find exactly which git version started tolerating this approach, either I'm going to have to binary search git versions until I find which one it breaks at, or I need to find the error in the git source & see when it changed. Kinda tedious so I didn't want to do it unnecessarily, but I'll sort that out.
I think this might be because the stdin isn't attached & git doesn't think it's interactive so dials back the output. I'll try connecting stdin to see if that works.
I expect nothing but perfection!!11 ;) Oops, will fix. |
ea4a398 to
0a6248b
Compare
Any lower than this and git errors when setting filter to blank value
Turns out I was almost right, it's that the progress output is suppressed by Git when it detects a non-TTY, but unfortunately just attaching stdin isn't enough. To get this output I think I'd have to call clone via a pseudo-TTY implementation like https://github.com/kr/pty - I'll play with it a bit but it may be overkill. Other than that detail I think this is ready. [edit]For future reference I did try reading the raw stderr (definitely where the progress is sent per git source) instead of using buffered I/O it in case that helped, but it didn't. The only thing that worked was assigning |
|
I found we already had a dependency on kr/pty via kr/pretty so I've used it to provide a pseudo-tty to |
git/git.go
Outdated
| // We can only support this on git version 2.2.0+ | ||
| // Before that, setting filters to blank would break clone | ||
| if !Config.IsGitVersionAtLeast("2.2.0") { | ||
| return errors.New("git lfs clone requires git version 2.2.0+") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't you set the smudge filter to cat in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! Done.
|
Any chance of a merge on this one now @technoweenie? :) |
|
I was planning on a quick 1.1.2 release for some bug fixes. I think this would be more appropriate in 1.2. Maybe we can just cut 1.2's scope and ship that instead though :) |
|
Maybe create a separate maintenance branch for 1.1.x? Then we could start pulling future 1.2 stuff together in master. |
|
Ah yeah, we even have a backport script for that :) |
Implement `git lfs clone`
|
Here's another vote for having a release with this fix in it. I ended up building the Windows installer locally of the current master contents at the time. Thanks! |
| @@ -0,0 +1,70 @@ | |||
| package commands | |||
|
|
|||
| import ( | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nguyenvietson
Hi, I found a way to make Git LFS faster up to a factor of 100x in repositories with a large number of Git LFS files. I am looking for comments if my approach would be acceptable by the Git community. ## What is Git LFS? Git LFS [1] is an extension to Git that handles large files for Git repositories. The project gained quite some momentum as almost all major Git hosting services support it (GitHub [1], Atlassian Bitbucket [2], GitLab [4]). ## What is the problem with Git LFS? Git LFS is an application that is executed via Git clean/smudge filter. The process invocation of these filters requires noticeable time (especially on Windows) even if the Git LFS executable only accesses its local cache. Based on my previous findings [5] Steve Streeting (@sinbad) improved the clone times of Git LFS repositories with a lot of files by a factor of 10 or more [6][7]. Unfortunately that fix helps only with cloning. Any local Git operation that invokes the clean/smudge filter (e.g. switching branches) is still slow. Even on the Git mailing list a user reported that issue [8]. ## Proposed solution Git LFS caches its objects under .git/lfs/objects. Most of the time Git LFS objects are already available in the cache (e.g. if you switch branches back and forth). I implemented these "cache hits" natively in Git. Please note that this implementation is just a quick and dirty proof of concept. If the Git community agrees that this kind of approach would be acceptable then I will start to work on a proper patch series with cross platform support and unit tests. ## Performance tests I executed both test runs on a 2,5 GHz Intel Core i7 with SSD and OS X. A test run is the consecutive execution of four Git commands: 1. clone the repo 2. checkout to the "removed-files" branch 3. timed: checkout the "master" branch 4. timed: checkout "removed-files" branch Test command: set -x; git lfs clone https://github.com/larsxschneider/lfstest-manyfiles.git repo; cd repo; git checkout removed-files; time git checkout master; time git checkout removed-files I compiled Git with the following flags: NO_GETTEXT=YesPlease NEEDS_SSL_WITH_CRYPTO=YesPlease make -j 8 CFLAGS="-I/usr/local/opt/openssl/include" LDFLAGS="-L/usr/local/opt/openssl/lib" ### TEST RUN A -- Default Git 2.9 (ab7797d) and Git LFS 1.2.1 + git lfs clone https://github.com/larsxschneider/lfstest-manyfiles.git repo Cloning into 'repo'... warning: templates not found /Users/lars/share/git-core/templates remote: Counting objects: 15012, done. remote: Total 15012 (delta 0), reused 0 (delta 0), pack-reused 15012 Receiving objects: 100% (15012/15012), 2.02 MiB | 1.77 MiB/s, done. Checking connectivity... done. Checking out files: 100% (15001/15001), done. Git LFS: (15000 of 15000 files) 0 B / 77.04 KB + cd repo + git checkout removed-files Branch removed-files set up to track remote branch removed-files from origin. Switched to a new branch 'removed-files' + git checkout master Checking out files: 100% (12000/12000), done. Switched to branch 'master' Your branch is up-to-date with 'origin/master'. real 6m2.979s user 2m39.066s sys 2m41.610s + git checkout removed-files Switched to branch 'removed-files' Your branch is up-to-date with 'origin/removed-files'. real 0m1.310s user 0m0.385s sys 0m0.881s ### TEST RUN B -- Default Git 2.9 with native LFS cache and Git LFS 1.2.1 https://github.com/larsxschneider/git/tree/lfs-cache + git lfs clone https://github.com/larsxschneider/lfstest-manyfiles.git repo Cloning into 'repo'... warning: templates not found /Users/lars/share/git-core/templates remote: Counting objects: 15012, done. remote: Total 15012 (delta 0), reused 0 (delta 0), pack-reused 15012 Receiving objects: 100% (15012/15012), 2.02 MiB | 1.44 MiB/s, done. Checking connectivity... done. Git LFS: (15001 of 15000 files) 0 B / 77.04 KB + cd repo + git checkout removed-files Branch removed-files set up to track remote branch removed-files from origin. Switched to a new branch 'removed-files' + git checkout master Checking out files: 100% (12000/12000), done. Switched to branch 'master' Your branch is up-to-date with 'origin/master'. real 0m2.267s user 0m0.295s sys 0m1.948s + git checkout removed-files Switched to branch 'removed-files' Your branch is up-to-date with 'origin/removed-files'. real 0m0.715s user 0m0.072s sys 0m0.672s ### Results Default Git: 6m2.979s + 0m1.310s = 364s Git with native LFS cache access: 0m2.267s + 0m0.715s = 4s The native cache solution is almost 100x faster when switching branches on my local machine with a test repository containing 15,000 Git LFS files. Based on my previous experience with Git LFS clone I expect even more dramatic results on Windows. Thanks, Lars [1] https://git-lfs.github.com/ [2] https://github.com/blog/1986-announcing-git-large-file-storage-lfs [3] http://blogs.atlassian.com/2016/02/git-lfs-for-designers-game-developers-architects/ [4] https://about.gitlab.com/2015/11/23/announcing-git-lfs-support-in-gitlab/ [5] git-lfs/git-lfs#931 (comment) [6] git-lfs/git-lfs#988 [7] https://developer.atlassian.com/blog/2016/04/git-lfs-12-clone-faster/ [8] http://article.gmane.org/gmane.comp.version-control.git/297809
Adds a new command
git lfs clonewhich allows for more efficient cloning of LFS repositories by suppressing filters during clone/checkout then doinggit lfs pullafterwards. Automatically suppresses error messages generated by this process.Addresses #931, thanks to @larsxschneider for the extensive analysis, this is 100% based on his findings.