-
Notifications
You must be signed in to change notification settings - Fork 30.9k
GLM-4.5V Model Support #39805
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GLM-4.5V Model Support #39805
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect! I think there are a few things we can still remove and let the modular copy. Overall very clean
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey! Looks nice overall 🤗 But we absolutely want to rename ALL classes with Glm4vMoe
prefixes instead of Glm4v_moe
as we enforce CamelCasing! Made a few other comments as well 🤗
|
||
class Glm4v_moeVisionConfig(Glm4vVisionConfig): | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want the model name to be Glm4vMoe
everywhere, not Glm4v_moe
!!! We only use CamelCasing!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix
|
||
def apply_multimodal_rotary_pos_emb(q, k, cos, sin, mrope_section, unsqueeze_dim=1): | ||
"""Applies Rotary Position Embedding with Multimodal Sections to the query and key tensors (https://qwenlm.github.io/blog/qwen2-vl/). | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be imported, not redefined here (I see it was already mentioned as a review comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the code of apply_multimodal_rotary_pos_emb inner is changed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
if self.use_qk_norm: # main diff from Llama |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's always True, let's simply use it instead of having an if ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the latest model, this setting is False, and it's not yet certain if any models with True have been released, just like our GLM-4.5.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the latest model, this setting is False,
in that case we can remove the codepath! If a new model is release we'll add it for you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your hardwork!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice! 🧼
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice and minimal! 🚀
run-slow: auto, glm4_moe, glm4v, glm4v_moe |
This comment contains run-slow, running the specified jobs: models: ['models/auto', 'models/glm4_moe', 'models/glm4v', 'models/glm4v_moe'] |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
[For maintainers] Suggested jobs to run (before merge) run-slow: auto, glm4_moe, glm4v, glm4v_moe |
@bot /style |
## Summary <!--- This is a required section; please describe the main purpose of this proposed code change. ---> This PR adds support for GLM4.1V (GLM-4 Vision) models to the Liger Kernel #855 https://huggingface.co/zai-org/GLM-4.5 This model have been merged in huggingface/transformers#39805 <!--- ## Details This is an optional section; is there anything specific that reviewers should be aware of? ---> ## Testing Done <!--- This is a required section; please describe how this change was tested. ---> Found that `python3 -m pytest test/convergence/bf16/test_mini_models.py -k 'glm4v_moe' -rF` has `AssertionError: [Loss]Number of mismatched elements: 14` with <details> <summary>Test result</summary> ``` AssertionError: [Loss]Number of mismatched elements: 14 Mismatch at index (0, 5): tensor1[(0, 5)] = 8.733983993530273, tensor2[(0, 5)] = 8.52511215209961 Mismatch at index (0, 8): tensor1[(0, 8)] = 7.2776618003845215, tensor2[(0, 8)] = 7.524500846862793 Mismatch at index (0, 9): tensor1[(0, 9)] = 6.917590618133545, tensor2[(0, 9)] = 7.175967216491699 Mismatch at index (0, 13): tensor1[(0, 13)] = 5.685216426849365, tensor2[(0, 13)] = 5.427236557006836 Mismatch at index (0, 14): tensor1[(0, 14)] = 5.337466239929199, tensor2[(0, 14)] = 5.049449443817139 ... and 9 more mismatched elements. ``` </details> <!-- Replace BLANK with your device type. For example, A100-80G-PCIe Complete the following tasks before sending your PR, and replace `[ ]` with `[x]` to indicate you have done them. --> - Hardware Type: <BLANK> - [x] run `make test` to ensure correctness - [x] run `make checkstyle` to ensure code style - [x] run `make test-convergence` to ensure convergence --------- Co-authored-by: Shao Tang <tangshao28@gmail.com> Co-authored-by: Steven Shimizu <shimizust@gmail.com>
This PR will complete two contents