KEMBAR78
Hash tensor data during deduplication by VikParuchuri · Pull Request #932 · huggingface/optimum · GitHub
Skip to content

Conversation

@VikParuchuri
Copy link
Contributor

What does this PR do?

When exporting a causal lm model to onnx, the past key version and regular version are merged. The merge loops through each tensor in the graph and stores the values as dict keys. This requires massive memory (>100GB for an 11GB model), and prevents merging from happening.

This is due to the tensor_data in _find_duplicate_initializers being converted from bytes to a tuple. I assume there is a good reason to convert it to a tuple (maybe for comparison across dtypes).

As such, this PR doesn't remove that conversion, but instead converts the tuple to a string and hashes it. It uses SHA-512 to reduce the probability of a hash collision. This uses very little memory to do the deduplication.

You could also:

  • just convert to str to remove the possibility of collisions - this will take more memory than hashing
  • remove the initial tuple conversion entirely - also more memory than hashing
  • Trade off memory for compute - loop through each pair of tensors to compare them - much slower than hashing

I have done some basic manual testing, and this method seems to work.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@fxmarty
Copy link
Contributor

fxmarty commented Mar 29, 2023

Thanks a lot for the PR, it's an issue I've had with large models but did not look into it, this will surely help!

@fxmarty fxmarty requested review from JingyaHuang and fxmarty March 29, 2023 20:44
@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Mar 29, 2023

The documentation is not available anymore as the PR was closed or merged.

@fxmarty
Copy link
Contributor

fxmarty commented Mar 30, 2023

@VikParuchuri This step str(tensor_data).encode("utf-8") is quite slow. I wonder if there is anything better we could hash than this? I suspect for large models this will be prohibitively slow.

Hashing simply tensor_data does not seem to work, getting object supporting the buffer API required. This is kind of surprising as it's of type <class 'bytes'>, and what you get after the encode is also a byte? Any idea @michaelbenayoun?

The reason we put that in a tuple is that some initializers have names as onnx::MatMul_3741, that may be different between the two graphs. So we need to look into the actual data to check whether they are the same.

@fxmarty fxmarty requested a review from michaelbenayoun March 30, 2023 07:36
@michaelbenayoun
Copy link
Member

Hi @VikParuchuri,

That's great thanks!

This is due to the tensor_data in _find_duplicate_initializers being converted from bytes to a tuple. I assume there is a good reason to convert it to a tuple (maybe for comparison across dtypes).

I actually do not remember why make tensor_data a tuple, have you tried removing this? I tried and it seems to be working, at least for the one model I tried.

@fxmarty
Copy link
Contributor

fxmarty commented Mar 30, 2023

For reference: https://github.com/huggingface/optimum/pull/587/files#r1152974762. I think it's safe to keep as a tuple + use to_array for now.

@michaelbenayoun
Copy link
Member

@fxmarty I am not talking about the tuple in the dictionary but tensor_data.

@VikParuchuri
Copy link
Contributor Author

@fxmarty Your method seems to work! I updated the PR.

@fxmarty
Copy link
Contributor

fxmarty commented Mar 30, 2023

@VikParuchuri Great! Could you do pip install -e .[quality] and make style?

@VikParuchuri
Copy link
Contributor Author

Done

Copy link
Contributor

@fxmarty fxmarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thank you for the fix!

@fxmarty fxmarty merged commit bcfe24e into huggingface:main Mar 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants