-
Notifications
You must be signed in to change notification settings - Fork 598
Hash tensor data during deduplication #932
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks a lot for the PR, it's an issue I've had with large models but did not look into it, this will surely help! |
|
The documentation is not available anymore as the PR was closed or merged. |
|
@VikParuchuri This step Hashing simply The reason we put that in a tuple is that some initializers have names as |
|
Hi @VikParuchuri, That's great thanks!
I actually do not remember why make |
|
For reference: https://github.com/huggingface/optimum/pull/587/files#r1152974762. I think it's safe to keep as a tuple + use |
|
@fxmarty I am not talking about the tuple in the dictionary but |
|
@fxmarty Your method seems to work! I updated the PR. |
|
@VikParuchuri Great! Could you do |
|
Done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thank you for the fix!
What does this PR do?
When exporting a causal lm model to onnx, the past key version and regular version are merged. The merge loops through each tensor in the graph and stores the values as dict keys. This requires massive memory (>100GB for an 11GB model), and prevents merging from happening.
This is due to the
tensor_datain_find_duplicate_initializersbeing converted frombytesto atuple. I assume there is a good reason to convert it to a tuple (maybe for comparison across dtypes).As such, this PR doesn't remove that conversion, but instead converts the tuple to a string and hashes it. It uses SHA-512 to reduce the probability of a hash collision. This uses very little memory to do the deduplication.
You could also:
strto remove the possibility of collisions - this will take more memory than hashingtupleconversion entirely - also more memory than hashingI have done some basic manual testing, and this method seems to work.
Before submitting