- 
                Notifications
    You must be signed in to change notification settings 
- Fork 5.2k
Clone files on OSX-like platforms when possible, instead of copying the whole file #79243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
• Use copyfile (with COPYFILE_CLONE_FORCE) when possible on macOS to clone the file while still keeping file locking logic intact • Split common Unix logic into multiple functions that the macOS implementation uses parts of at different times • Add string version of ResolveLinkTarget to save the allocation since part of the code needs it • Need to add tests to check the file is actually cloned so we know if it works or not
| Tagging subscribers to this area: @dotnet/area-system-io Issue DetailsFixes #77835. This should improve performance greatly for  Informative: 
 Todo: 
 
 | 
| Btw, this implementation is not done, I want to see if any tests fail first though. | 
• I missed setting StringMarshalling on the LibraryImport • I missed partial on the CopyFile method implementations
• Apparently 'Both partial method declarations must be unsafe or neither may be unsafe'
• I love partial methods • Move unsafe keyword into the OSX method rather than in method declaration • Fix accessibility modifier of CopyFile in FileSystem.Unix.cs
• Fix indentation (should have used spaces) • Import Microsoft.Win32.SafeHandles in FileSystem.CopyFile.OSX.cs file • Fix for SA1205 'Partial elements should declare an access modifier' • Fix typo in FileSystem.CopyFile.OtherUnix.cs of 'startedCopyFile' • Fix missing openDst parameter code in StartCopyFile • Fix not checking the nullability in StandardCopyFile, which led to 'Possible null reference argument for parameter'
• Add missing parameter value (openNewFile) to OpenCopyFileDstHandle • Fix misnamed variable usages throughout some code • Properly qualify Interop.ErrorInfo • This should be the last fix for build issues for this set
| Sorry for all of the fix commits, I couldn't get build locally to work, so I copied the code into a new project to test it and must have missed some of the changes here. | 
• Add missing nullable ? for dstHandle (which obtains the file lock)
• Test failures were since copying a file onto itself should cause an error, this fixes that
• Fix nullability issues
…ain) • The variable in the if statements weren't updated • Handling for if readlink fails
…t#77835' • The source path should have used sourceFullPath rather than destPath
| @marek-safar, this applies to macOS, iOS, tvOS, and Mac Catalyst in case you wanted to add tags for those. Can you also 'assign' me if you're able? Thanks. | 
| Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it. | 
| @filipnavara is there any chance you could review this PR? You are currently the most experience person in this particular area. | 
| Is there a reason why 24 test runs have been cancelled so far? Do the tests need to be re-run? | 
| I'm really looking forward to this change! Will we be seeing it land in a .NET 8 preview? | 
| 
 If it's merged in the next month or two, it will be in .NET 8. | 
| 
 It's hopefully just a few more stages of review at most. I'm hoping it will make it to .NET 8 also. | 
| Per @adamsitnik's earlier comment, macOS isn't benchmarked in the perf lab. So before merging, we need some benchmark results that verify there is a performance gain. @hamarb123 can you look at the documentation Adam shared, and run the benchmarks on your machine? | 
| Benchmark results:
 System.IO.Tests.Perf_FileWith changes
 Without changes
 | 
| 
 That's a pleasant surprise. My expectation was to see a regression for small files with overwrite since we're making additional system calls and the benefit of cloning is low for small files. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks very good, I found only one thing that could be improved (PTAL at my comment).
The perf win is very impressive. I've benchmarked it on my old macBook:
BenchmarkDotNet=v0.13.2.2052-nightly, OS=macOS Monterey 12.6.5 (21G531) [Darwin 21.6.0]
Intel Core i7-5557U CPU 3.10GHz (Broadwell), 1 CPU, 4 logical and 2 physical cores
.NET SDK=8.0.100-preview.6.23312.5
  [Host] : .NET 8.0.0 (8.0.23.30704), X64 RyuJIT AVX2
  PR     : .NET 8.0.0 (42.42.42.42424), X64 RyuJIT AVX2
  main   : .NET 8.0.0 (8.0.23.30704), X64 RyuJIT AVX2| Method | Job | size | Mean | StdDev | Ratio | 
|---|---|---|---|---|---|
| CopyTo | PR | 512 | 438.7 μs | 106.58 μs | 0.75 | 
| CopyTo | main | 512 | 588.0 μs | 15.62 μs | 1.00 | 
| CopyToOverwrite | PR | 512 | 298.8 μs | 6.18 μs | 0.79 | 
| CopyToOverwrite | main | 512 | 381.4 μs | 4.55 μs | 1.00 | 
| CopyTo | PR | 4096 | 237.0 μs | 6.29 μs | 0.39 | 
| CopyTo | main | 4096 | 606.9 μs | 37.95 μs | 1.00 | 
| CopyToOverwrite | PR | 4096 | 293.7 μs | 7.59 μs | 0.76 | 
| CopyToOverwrite | main | 4096 | 388.1 μs | 3.02 μs | 1.00 | 
| CopyTo | PR | 1048576 | 239.9 μs | 10.95 μs | 0.09 | 
| CopyTo | main | 1048576 | 2,567.9 μs | 107.61 μs | 1.00 | 
| CopyToOverwrite | PR | 1048576 | 298.2 μs | 11.47 μs | 0.12 | 
| CopyToOverwrite | main | 1048576 | 2,585.2 μs | 63.77 μs | 1.00 | 
| CopyTo | PR | 104857600 | 247.0 μs | 8.00 μs | 0.001 | 
| CopyTo | main | 104857600 | 230,797.9 μs | 42,923.74 μs | 1.000 | 
| CopyToOverwrite | PR | 104857600 | 291.4 μs | 6.45 μs | 0.001 | 
| CopyToOverwrite | main | 104857600 | 256,931.8 μs | 2,784.23 μs | 1.000 | 
Great job @hamarb123 !!
        
          
                src/libraries/System.Private.CoreLib/src/System/IO/FileSystem.TryCloneFile.OSX.cs
          
            Show resolved
            Hide resolved
        
              
          
                src/libraries/System.Private.CoreLib/src/System/IO/FileSystem.TryCloneFile.OSX.cs
          
            Show resolved
            Hide resolved
        
      | 
 While this isn't a common scenario, in the name of completeness, what does the perf look like if, after copying, the benchmark then modifies the destination file in some way? | 
| 
 That is a very good question! I've modified the benchmarks and just added  [Benchmark]
[Arguments(HalfKibibyte)]
[Arguments(FourKibibytes)]
[Arguments(OneMibibyte)]
[Arguments(HundredMibibytes)]
public void CopyTo_AndModify(int size)
{
    File.Delete(_testFilePath);
    File.Copy(_filesToRead[size], _testFilePath); // overwrite defaults to false
+   File.WriteAllBytes(_testFilePath, File.ReadAllBytes(_filesToRead[size]));
}
[Benchmark]
[Arguments(HalfKibibyte)]
[Arguments(FourKibibytes)]
[Arguments(OneMibibyte)]
[Arguments(HundredMibibytes)]
public void CopyToOverwrite_AndModify(int size)
{
    File.Copy(_filesToRead[size], _testFilePath, overwrite: true);
+   File.WriteAllBytes(_testFilePath, File.ReadAllBytes(_filesToRead[size]));
}We have still a major win for large files and no regression for smaller files 
 | 
| 
 WriteAllBytes opens the file as Create, which means it's going to effectively delete the destination and write a new file rather than modify the one that's there. | 
| 
 Sorry, I forgot about that (BTW it just proves that we should really add #84532) Updated benchmark: private byte[] _oneKb = new byte[1024];
[Benchmark]
[Arguments(HalfKibibyte)]
[Arguments(FourKibibytes)]
[Arguments(OneMibibyte)]
[Arguments(HundredMibibytes)]
public void CopyTo_AndModify(int size)
{
    File.Delete(_testFilePath);
    File.Copy(_filesToRead[size], _testFilePath); // overwrite defaults to false
    
    using FileStream fs = File.OpenWrite(_testFilePath);
    fs.Write(_oneKb);
}
[Benchmark]
[Arguments(HalfKibibyte)]
[Arguments(FourKibibytes)]
[Arguments(OneMibibyte)]
[Arguments(HundredMibibytes)]
public void CopyToOverwrite_AndModify(int size)
{
    File.Copy(_filesToRead[size], _testFilePath, overwrite: true);
    using FileStream fs = File.OpenWrite(_testFilePath);
    fs.Write(_oneKb);
}And the results: BenchmarkDotNet=v0.13.2.2052-nightly, OS=macOS Monterey 12.6.5 (21G531) [Darwin 21.6.0]
Intel Core i7-5557U CPU 3.10GHz (Broadwell), 1 CPU, 4 logical and 2 physical cores
.NET SDK=8.0.100-preview.6.23312.5
  [Host] : .NET 8.0.0 (8.0.23.30704), X64 RyuJIT AVX2
  PR     : .NET 8.0.0 (42.42.42.42424), X64 RyuJIT AVX2
  main   : .NET 8.0.0 (8.0.23.30704), X64 RyuJIT AVX2
IterationCount=40
 | 
| 
 Not really 😄 
 Thanks. So this does regress small overwrites that are subsequently modified. | 
| 
 Why? Do you believe that appending a buffer of bytes to an existing file is a rare scenario? | 
| Thanks for this nice change @hamarb123 . Would you like to tackle another issue? There are also a number marked api-approved that are available. | 
| 
 
 
 I'm not commenting on the scenario of appending a buffer of bytes to an existing file. I'm commenting that a need to mutate a recently copied file as part of benchmarking a copy-on-write optimization and to achieve that writing two short lines of code doesn't prove that we need to add a new API. | 
| 
 Thanks! I'll have a look tomorrow (also known as a today at a sensible hour lol). I was also going to do the one for ReFS on Windows, was setting up a VM to test dev drives yesterday. 
 Just following up on this also. Presumably this API just does a block copy with the files. Is this something that's commonly used (I know I've never done it like this)? Because it doesn't seem to benefit from this, but maybe no-one uses it, idk. | 
| 
 IIRC The  
 I don't have enough data to answer your question, but if you can provide an optimization that improves such scenario without introducing a lot of code complexity I am happy to review it. | 
| 
 Ah yes, indeed that would be nice to get into .NET 8 | 
Fixes #77835.
This should improve performance greatly for
File.Copywhen copying from one APFS volume to a different file in the volume (I think it's within a volume anyway). It will clone the file which is instant instead of copying all of the bytes, and if cloning doesn't work it will fall back to the normal method of copying.Informative:
clonefilewhen possible on macOS to clone the file while still keeping file locking logic intactTodo: