KEMBAR78
fix: remove reference_model_buffers in fsdp2 by yuki-97 · Pull Request #558 · NVIDIA-NeMo/RL · GitHub
Skip to content

Conversation

@yuki-97
Copy link
Contributor

@yuki-97 yuki-97 commented Jun 26, 2025

What does this PR do ?

remove reference_model_buffers in fsdp2 to fix DeepSeek-V2-Lite's following error.

RuntimeError: The size of tensor a (512) must match the size of tensor b (163840) at non-singleton dimension 0

Test Result

DeepSeek-V2-Lite Llama-3.1-8B-Instruct
image image

Signed-off-by: Yuki Huang <yukih@nvidia.com>
@yuki-97 yuki-97 requested review from terrykong and yfw June 26, 2025 15:17
@yuki-97 yuki-97 added the CI:L1 Run doctests, unit tests, and functional tests label Jun 26, 2025
@terrykong terrykong enabled auto-merge June 26, 2025 16:32
@terrykong terrykong added CI:docs Run doctest and removed CI:L1 Run doctests, unit tests, and functional tests labels Jun 26, 2025
@terrykong terrykong added this pull request to the merge queue Jun 26, 2025
Merged via the queue into main with commit 0507318 Jun 26, 2025
26 of 32 checks passed
@terrykong terrykong deleted the yukih/deepseek-fsdp2 branch June 26, 2025 19:22
parthchadha pushed a commit that referenced this pull request Jun 26, 2025
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Parth Chadha <pchadha@nvidia.com>
xxman-google pushed a commit to xxman-google/NeMo-RL that referenced this pull request Jun 27, 2025
Signed-off-by: Yuki Huang <yukih@nvidia.com>
xxman-google pushed a commit to xxman-google/NeMo-RL that referenced this pull request Jun 27, 2025
Signed-off-by: Yuki Huang <yukih@nvidia.com>
xxman-google pushed a commit to xxman-google/NeMo-RL that referenced this pull request Jun 30, 2025
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Xuehan <xxman@google.com>
xxman-google pushed a commit to xxman-google/NeMo-RL that referenced this pull request Jun 30, 2025
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Xuehan <xxman@google.com>
therealnaveenkamal pushed a commit to therealnaveenkamal/RL that referenced this pull request Jul 7, 2025
Signed-off-by: Yuki Huang <yukih@nvidia.com>
YzjiaoNvd pushed a commit to YzjiaoNvd/NeMo-RL that referenced this pull request Jul 14, 2025
Signed-off-by: Yuki Huang <yukih@nvidia.com>
KiddoZhu pushed a commit that referenced this pull request Jul 28, 2025
Signed-off-by: Yuki Huang <yukih@nvidia.com>
FannYYW pushed a commit to xxman-google/NeMo-RL that referenced this pull request Aug 5, 2025
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:docs Run doctest

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants