KEMBAR78
fix megatron bert convert state dict naming by Codle · Pull Request #15820 · huggingface/transformers · GitHub
Skip to content

Conversation

@Codle
Copy link
Contributor

@Codle Codle commented Feb 24, 2022

What does this PR do?

Fixes some key of state_dict changing error in Megatron-Bert convert script. In checkpoint v3, MegatronLM seems to change attention -> self_attention, and HF has updated this key at GPT convert script#12007 while the BERT is still not.

layers.0.input_layernorm.weight
layers.0.input_layernorm.bias
layers.0.self_attention.query_key_value.weight
layers.0.self_attention.query_key_value.bias
layers.0.self_attention.dense.weight
layers.0.self_attention.dense.bias
layers.0.post_attention_layernorm.weight
layers.0.post_attention_layernorm.bias
layers.0.mlp.dense_h_to_4h.weight
layers.0.mlp.dense_h_to_4h.bias
layers.0.mlp.dense_4h_to_h.weight
layers.0.mlp.dense_4h_to_h.bias

Who can review?

@LysandreJik

@HuggingFaceDocBuilder
Copy link

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, this looks good to me! Pinging @jdemouth as well!

@github-actions
Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@LysandreJik
Copy link
Member

Merging

@LysandreJik LysandreJik merged commit 33cd4be into huggingface:main Apr 18, 2022
elusenji pushed a commit to elusenji/transformers that referenced this pull request Jun 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants