-
Notifications
You must be signed in to change notification settings - Fork 30.9k
Adding Support for Qwen3-Next #40771
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
457f7b8
to
a4dfa88
Compare
bfa3329
to
fd6affc
Compare
run-slow: qwen3_next |
This comment contains run-slow, running the specified jobs: models: ['models/qwen3_next'] |
run-slow: qwen3_next |
This comment contains run-slow, running the specified jobs: models: ['models/qwen3_next'] |
[For maintainers] Suggested jobs to run (before merge) run-slow: auto, qwen3_next |
All good! Merging! |
Hi @bozheng-hit , you mentioned it has high inference throughput, but currently the MoE layer in transformers is slow. Do you plan to replace the MoE layer with a fused kernel, like GPT-OSS did? |
When will it come out? It's nowhere yet |
Efficient MoEs are planned for all moes in transformers ! 🤗 |
conv_state, | ||
weight, | ||
bias=None, | ||
activation=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The activation
param seems to have no functionality, what is it good for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make functions coherent between fast path and torch path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see – thanks! 👍🏻
* Add Qwen3-Next. * fix * style * doc * simplify * fix name * lazy cache init to allow multi-gpu inference * simplify * fix config to support different hybrid ratio. * remove last commit (redundant) * tests * fix test --------- Co-authored-by: bozheng-hit <dsoul0621@gmail.com> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
* Add Qwen3-Next. * fix * style * doc * simplify * fix name * lazy cache init to allow multi-gpu inference * simplify * fix config to support different hybrid ratio. * remove last commit (redundant) * tests * fix test --------- Co-authored-by: bozheng-hit <dsoul0621@gmail.com> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Adding Support for Qwen3-Next
This PR adds the support of codes for the upcoming Qwen3-Next models. For information about Qwen, please visit:
👉 https://github.com/QwenLM/Qwen3
Special thanks to @Cyrilvallez and @ArthurZucker for their valuable feedback and thorough review of this PR!