Adding Support for Qwen3-Next #40771

bozheng-hit · 2025-09-09T14:08:53Z

Adding Support for Qwen3-Next

This PR adds the support of codes for the upcoming Qwen3-Next models. For information about Qwen, please visit:
👉 https://github.com/QwenLM/Qwen3

Special thanks to @Cyrilvallez and @ArthurZucker for their valuable feedback and thorough review of this PR!

Cyrilvallez · 2025-09-09T16:31:18Z

run-slow: qwen3_next

github-actions · 2025-09-09T16:32:52Z

This comment contains run-slow, running the specified jobs:

models: ['models/qwen3_next']
quantizations: [] ...

Cyrilvallez · 2025-09-09T17:02:27Z

run-slow: qwen3_next

github-actions · 2025-09-09T17:03:47Z

This comment contains run-slow, running the specified jobs:

models: ['models/qwen3_next']
quantizations: [] ...

github-actions · 2025-09-09T21:08:51Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, qwen3_next

Cyrilvallez · 2025-09-09T21:46:51Z

All good! Merging!

woct0rdho · 2025-09-10T00:48:59Z

Hi @bozheng-hit , you mentioned it has high inference throughput, but currently the MoE layer in transformers is slow. Do you plan to replace the MoE layer with a fused kernel, like GPT-OSS did?

surak · 2025-09-10T09:21:16Z

When will it come out? It's nowhere yet

ArthurZucker · 2025-09-10T12:50:30Z

Efficient MoEs are planned for all moes in transformers ! 🤗

d-kleine · 2025-09-20T23:33:49Z

src/transformers/models/qwen3_next/modeling_qwen3_next.py

+    conv_state,
+    weight,
+    bias=None,
+    activation=None,


The activation param seems to have no functionality, what is it good for?

To make functions coherent between fast path and torch path

I see – thanks! 👍🏻

* Add Qwen3-Next. * fix * style * doc * simplify * fix name * lazy cache init to allow multi-gpu inference * simplify * fix config to support different hybrid ratio. * remove last commit (redundant) * tests * fix test --------- Co-authored-by: bozheng-hit <dsoul0621@gmail.com> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

bozheng-hit force-pushed the qwen3_next branch from 457f7b8 to a4dfa88 Compare September 9, 2025 14:13

bozheng-hit added 2 commits September 9, 2025 18:13

Add Qwen3-Next.

4b742f9

fix

fd6affc

Cyrilvallez force-pushed the qwen3_next branch from bfa3329 to fd6affc Compare September 9, 2025 16:16

style

216fd44

Cyrilvallez added 3 commits September 9, 2025 18:39

doc

465f683

simplify

816c265

fix name

540fd54

Cyrilvallez and others added 5 commits September 9, 2025 20:41

lazy cache init to allow multi-gpu inference

0c68c38

simplify

5511a9f

fix config to support different hybrid ratio.

a8df917

remove last commit (redundant)

1adfefa

tests

2c19e6e

fix test

5828cfa

Cyrilvallez merged commit b928235 into huggingface:main Sep 9, 2025
18 of 21 checks passed

terrykong mentioned this pull request Sep 10, 2025

qwen3-next dtensor support NVIDIA-NeMo/RL#1108

Open

Jintao-Huang mentioned this pull request Sep 11, 2025

[model] Support qwen3Next (megatron) modelscope/ms-swift#5764

Merged

jacekpoplawski mentioned this pull request Sep 11, 2025

Feature Request: Qwen3-Next support ggml-org/llama.cpp#15940

Open

4 tasks

rankaiyx mentioned this pull request Sep 12, 2025

[Model Request] Qwen3-Next-80B-A3B mlc-ai/mlc-llm#3335

Open

d-kleine reviewed Sep 20, 2025

View reviewed changes

jakexcosme mentioned this pull request Oct 22, 2025

Feature Request: Qwen3-Next support COG-GTM/llama.cpp#144

Open

4 tasks

Adding Support for Qwen3-Next #40771

Adding Support for Qwen3-Next #40771

Conversation

bozheng-hit commented Sep 9, 2025

Adding Support for Qwen3-Next

Uh oh!

Cyrilvallez commented Sep 9, 2025

Uh oh!

github-actions bot commented Sep 9, 2025

Uh oh!

Cyrilvallez commented Sep 9, 2025

Uh oh!

github-actions bot commented Sep 9, 2025

Uh oh!

github-actions bot commented Sep 9, 2025

Uh oh!

Cyrilvallez commented Sep 9, 2025

Uh oh!

Uh oh!

woct0rdho commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

surak commented Sep 10, 2025

Uh oh!

ArthurZucker commented Sep 10, 2025

Uh oh!

d-kleine Sep 20, 2025

Choose a reason for hiding this comment

Uh oh!

Cyrilvallez Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

d-kleine Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

woct0rdho commented Sep 10, 2025 •

edited

Loading