-
Notifications
You must be signed in to change notification settings - Fork 30.9k
[Glm4.5V] fix vLLM support #40696
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Glm4.5V] fix vLLM support #40696
Conversation
|
[For maintainers] Suggested jobs to run (before merge) run-slow: glm4v |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing! I'm trusting you, I'm a bit out of context 😄
|
|
||
| @dataclass | ||
| class VideoMetadata: | ||
| class VideoMetadata(Mapping): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the motivation for making it iterable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so that we can use VideoMetadata in same way as dict. We allow users to pass both formats currently
| patch_size = images_kwargs.get("patch_size", self.patch_size) | ||
| merge_size = images_kwargs.get("merge_size", self.merge_size) | ||
| size = images_kwargs.get("size", self.size) | ||
| size = images_kwargs.get("size", {"shortest_edge": 112 * 112, "longest_edge": 28 * 28 * 15000}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we align self.size instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the defaults are already using these values. It is the model checkpoint that has another set of values saved, and for some reason we never use it. Didn't want to break BC but using config values is actually correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm, can we update config values instead? or Hub's PR will not be merged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can try, lemme ask one of the authors who is on slack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
asked the authors and the config values are the correct ones. So will update processor in the next PR to use config defaults and mark it as slightly breaking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, thanks a lot!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aryan ****
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aryan ****
What does this PR do?
GLM-4.5V inference does not work in the latest release and this PR fixes it. Specifically, users can now pass in a
VideoMetadataobject to processor's call which is needed to backwards compatibility (adding a test for this, one sec)And a tiny fix in image processing to make helper utility aligned with
self._preprocess