-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
[Model] Add GraniteMoeHybrid 4.0 model #17497
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Model] Add GraniteMoeHybrid 4.0 model #17497
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
55bc3e7
to
3569fe4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass through looks good and implementation looks clean!
When will the model be available for testing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can ibm-research/granite-4.0-tiny-test
be added to test_hybrid.py
? It would be good to get a TP test in place
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @tlrmchlsmth, I'm working on the transformers side of this PR - ibm-research/granite-4.0-tiny-test
won't be the name of the actual model, it's just a placeholder since the name of the model hasn't been decided quite yet 😅
I agree that it would be nice to add a hybrid model test though
(CC @DarkLight1337 since I had just sent a note to him about this PR a little while ago also)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the model to the test_hybrid.py
, but since the name is not final and therefore the link to HF is not established yet, the tests are marked as skipped for now. Is that okay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @bohnstingl sounds good! Were you able to run the tests on the upcoming model? I had seen some tp failures in my environment, but not just for this model, so may have something going on with my machines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for opening this! some thoughts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason the super().__init__()
is called in the middle here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the call to super().__init__()
has been moved to the first instruction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool thanks! I think you may have forgotten to push changes for this file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove the asserts and raise errors instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The asserts are replaced with RuntimeErrors
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be changed to set self.rotary_emb
to None
if it's not rope
? This will avoid potential attribute errors and we can do
if self.rotary_emb is not None:
query, key = self.rotary_emb(positions, query, key)
in forward()
instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need to store the position_embedding_type
attribute explicitly. Now we set self.rotary_emb
directly based on config.position_embedding_type
and to None
if there is no rope being used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be overwritten right below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The initial value of residual is now not fixed to None anymore
.
Can you move the test based on #17459? |
This pull request has merge conflicts that must be resolved before it can be |
@DarkLight1337, I moved the |
Yes sounds good. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for now, we can just comment it out in the hybrid model list instead of skipping like this for these tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reverted the changes of test_hybrid.py and just added the models as a comment to the HYBRID_MODELS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this supposed to be SSM_MODELS[0:1] + HYBRID_MODELS[0:2])
rather than [SSM_MODELS[0], HYBRID_MODELS[0]]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was just an approach to include the new model in the test, because with [SSM_MODELS[0], HYBRID_MODELS[0]]
, it would not be tested, right? I reverted those changes though for the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup! I think that is on purpose (there is a comment at the top of the file), although now sure about the reason. I assume probably because they take awhile to run
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it is to reduce CI cost
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same as tests/models/decoder_only/language/test_hybrid.py
with the new model added right? I think this PR has both files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I apologize, I missed to commit the deletion of the old file. This should now be fixed.
Signed-off-by: Thomas Ortner <boh@zurich.ibm.com>
590412a
to
0dbb1b9
Compare
@alex-jw-brooks. I apologize, I have missed to push the changes to |
bbff670
to
0d4f3cd
Compare
Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com>
Signed-off-by: Thomas Ortner <boh@zurich.ibm.com>
Signed-off-by: Thomas Ortner <boh@zurich.ibm.com>
Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com>
Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com>
Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com>
Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com>
Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com>
Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com>
Head branch was pushed to by a user without write access
f7de40b
to
495fe33
Compare
@DarkLight1337 , thank you for fixing the spacing. |
The CI only uses the HF transformers version defined in the requirements file |
… in CI Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com>
Head branch was pushed to by a user without write access
…rs in HF Transformers Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com>
Maybe need to also add |
…in HF Transformers Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com>
I don't know which exact version would include it. To be on the safe side, I've commented out for now. Update: Apparently I cannot comment out, as tests fail then. As suggested, I've used |
Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com>
Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com>
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com> Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com>
711dcd4
to
ca47978
Compare
Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com>
ca47978
to
95ede00
Compare
Two CI tests failed, both unrelated to this PR:
and
I've force-pushed to retrigger CI checks and see if this persists. (Generally, I'm not sure if this is the right way to retrigger or is there a way to retrigger selected tests only.) |
* [Model] Add GraniteMoeHybrid 4.0 model (vllm-project#17497) Signed-off-by: Thomas Ortner <boh@zurich.ibm.com> Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com> Co-authored-by: Thomas Ortner <boh@zurich.ibm.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Tyler Michael Smith <tysmith@redhat.com> * [easy] Fix logspam on PiecewiseBackend errors (vllm-project#17138) Signed-off-by: rzou <zou3519@gmail.com> * [Bugfix] Fixed prompt length for random dataset (vllm-project#17408) Signed-off-by: Mikhail Podvitskii <podvitskiymichael@gmail.com> * [Doc] Update notes for H2O-VL and Gemma3 (vllm-project#17219) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Misc] Fix ScalarType float4 naming (vllm-project#17690) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> * Fix `dockerfilegraph` pre-commit hook (vllm-project#17698) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> * [Bugfix] Fix triton import with local TritonPlaceholder (vllm-project#17446) Signed-off-by: Mengqing Cao <cmq0113@163.com> * [V1] Enable TPU V1 backend by default (vllm-project#17673) Signed-off-by: mgoin <mgoin64@gmail.com> * [V1][PP] Support PP for MultiprocExecutor (vllm-project#14219) Signed-off-by: jiang1.li <jiang1.li@intel.com> Signed-off-by: jiang.li <jiang1.li@intel.com> * [v1] AttentionMetadata for each layer (vllm-project#17394) Signed-off-by: Chen Zhang <zhangch99@outlook.com> * [Feat] Add deprecated=True to CLI args (vllm-project#17426) Signed-off-by: Aaron Pham <contact@aarnphm.xyz> * [Docs] Use gh-file to add links to tool_calling.md (vllm-project#17709) Signed-off-by: windsonsea <haifeng.yao@daocloud.io> * [v1] Introduce KVCacheBlocks as interface between Scheduler and KVCacheManager (vllm-project#17479) Signed-off-by: Chen Zhang <zhangch99@outlook.com> * [doc] Add RAG Integration example (vllm-project#17692) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com> * [Bugfix] Fix modality limits in vision language example (vllm-project#17721) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * Make right sidebar more readable in "Supported Models" (vllm-project#17723) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> * [TPU] Increase block size and reset block shapes (vllm-project#16458) * [Misc] Add Next Edit Prediction (NEP) datasets support in `benchmark_serving.py` (vllm-project#16839) Signed-off-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal> Signed-off-by: dtransposed <> Co-authored-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal> * [Bugfix] Fix for the condition to accept empty encoder inputs for mllama (vllm-project#17732) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> * [Kernel] Unified Triton kernel that doesn't distinguish between prefill + decode (vllm-project#16828) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> --------- Signed-off-by: Thomas Ortner <boh@zurich.ibm.com> Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com> Signed-off-by: rzou <zou3519@gmail.com> Signed-off-by: Mikhail Podvitskii <podvitskiymichael@gmail.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Mengqing Cao <cmq0113@163.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: jiang1.li <jiang1.li@intel.com> Signed-off-by: jiang.li <jiang1.li@intel.com> Signed-off-by: Chen Zhang <zhangch99@outlook.com> Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Signed-off-by: windsonsea <haifeng.yao@daocloud.io> Signed-off-by: reidliu41 <reid201711@gmail.com> Signed-off-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal> Signed-off-by: dtransposed <> Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Co-authored-by: Stan Wozniak <77159600+s3woz@users.noreply.github.com> Co-authored-by: Thomas Ortner <boh@zurich.ibm.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Tyler Michael Smith <tysmith@redhat.com> Co-authored-by: Richard Zou <zou3519@users.noreply.github.com> Co-authored-by: Mikhail Podvitskii <podvitskiymichael@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Mengqing Cao <cmq0113@163.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Li, Jiang <jiang1.li@intel.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: Aaron Pham <contact@aarnphm.xyz> Co-authored-by: Michael Yao <haifeng.yao@daocloud.io> Co-authored-by: Reid <61492567+reidliu41@users.noreply.github.com> Co-authored-by: reidliu41 <reid201711@gmail.com> Co-authored-by: Jevin Jiang <jevin0change@gmail.com> Co-authored-by: d.transposed <damian.bogunowicz@gmail.com> Co-authored-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal> Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com> Co-authored-by: Thomas Parnell <tpa@zurich.ibm.com> Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Thomas Ortner <boh@zurich.ibm.com> Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com> Co-authored-by: Thomas Ortner <boh@zurich.ibm.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Tyler Michael Smith <tysmith@redhat.com> Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>
Signed-off-by: Thomas Ortner <boh@zurich.ibm.com> Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com> Co-authored-by: Thomas Ortner <boh@zurich.ibm.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>
Signed-off-by: Thomas Ortner <boh@zurich.ibm.com> Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com> Co-authored-by: Thomas Ortner <boh@zurich.ibm.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Tyler Michael Smith <tysmith@redhat.com> Signed-off-by: Yuqi Zhang <yuqizhang@google.com>
The PR adds support for upcoming Granite4.0 models. It is a companion PR to huggingface/transformers#37658 for adding the same model to HF.
Note: Running the model in vLLM depends on having HF Transformers with Granite4.0 support installed, see https://huggingface.co/ibm-granite/granite-4.0-tiny-preview
Note: It is re-opening #17461 again after fixing wrong rebase. Currently rebased on newer main, signed-off, resolved conflicts.
@DarkLight1337 - As per suggestion, Sampler removed from the model.
The HF model is not available in main yet. Model tests are currently marked with 'skip'.
@tdoublep @bohnstingl