KEMBAR78
feat: refit metadata optimization by ZhiyuLi-Nvidia · Pull Request #686 · NVIDIA-NeMo/RL · GitHub
Skip to content

Conversation

ZhiyuLi-Nvidia
Copy link
Contributor

@ZhiyuLi-Nvidia ZhiyuLi-Nvidia commented Jul 17, 2025

What does this PR do ?

Issues

List issues that this PR closes (syntax):

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

  • ...

@ZhiyuLi-Nvidia ZhiyuLi-Nvidia changed the title fit: refit metadata optimization feat: refit metadata optimization Jul 17, 2025
@ZhiyuLi-Nvidia ZhiyuLi-Nvidia force-pushed the zhiyul/yukih/refit-optimization-minimal-serialization branch 2 times, most recently from bd680b0 to bb491df Compare July 18, 2025 08:37
@yfw
Copy link
Contributor

yfw commented Jul 18, 2025

PR looks good to me. I noticed that the function serialization is also part of this PR. Out of the 8% perf gain, do we have a breakdown of how much can be attributed to the function serialization and how much to the metadata optimization? Just wondering how much we are actually gaining from that part.

@yfw
Copy link
Contributor

yfw commented Jul 18, 2025

Also a minor thing since we are timing the refit separately now. Do you mind removing this print which spams the logs for large models?

@ZhiyuLi-Nvidia
Copy link
Contributor Author

ZhiyuLi-Nvidia commented Jul 18, 2025

PR looks good to me. I noticed that the function serialization is also part of this PR. Out of the 8% perf gain, do we have a breakdown of how much can be attributed to the function serialization and how much to the metadata optimization? Just wondering how much we are actually gaining from that part.

@yfw almost all 8% perf gain is from the function serialization.

pass list of keys only during refitting This change may not offer any speed optimization but will result in cleaner and more readable code. It might be reasonable to expect little improvement from changing a dictionary (key, offset pair) to a key list (requiring local offset reconstruction) during serialization.

yuki-97
yuki-97 previously approved these changes Jul 21, 2025
Copy link
Contributor

@yuki-97 yuki-97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ZhiyuLi-Nvidia , LGTM!

Copy link
Contributor

@guyueh1 guyueh1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small comment otherwise LGTM

@ZhiyuLi-Nvidia ZhiyuLi-Nvidia force-pushed the zhiyul/yukih/refit-optimization-minimal-serialization branch from 2025a57 to e677b9c Compare July 21, 2025 21:08
@github-actions github-actions bot added documentation Improvements or additions to documentation CI Relating to CI labels Jul 21, 2025
@ZhiyuLi-Nvidia
Copy link
Contributor Author

Also a minor thing since we are timing the refit separately now. Do you mind removing this print which spams the logs for large models?

Done. Could you take another look @yfw?

@ZhiyuLi-Nvidia ZhiyuLi-Nvidia force-pushed the zhiyul/yukih/refit-optimization-minimal-serialization branch from e677b9c to 2931ae2 Compare July 21, 2025 21:16
@github-actions github-actions bot removed documentation Improvements or additions to documentation CI Relating to CI labels Jul 21, 2025
yfw
yfw previously approved these changes Jul 21, 2025
yuki-97
yuki-97 previously approved these changes Jul 22, 2025
guyueh1
guyueh1 previously approved these changes Jul 22, 2025
yuki-97 added a commit that referenced this pull request Jul 22, 2025
Signed-off-by: Yuki Huang <yukih@nvidia.com>
yuki-97 added a commit that referenced this pull request Jul 28, 2025
Signed-off-by: Yuki Huang <yukih@nvidia.com>
yuki-97 and others added 6 commits July 29, 2025 11:26
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Zhiyu Li <zhiyul@nvidia.com>
Signed-off-by: Zhiyu Li <zhiyul@nvidia.com>
Signed-off-by: Zhiyu Li <zhiyul@nvidia.com>
Signed-off-by: Zhiyu Li <zhiyul@nvidia.com>
Signed-off-by: Zhiyu Li <zhiyul@nvidia.com>
Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>
@ZhiyuLi-Nvidia ZhiyuLi-Nvidia dismissed stale reviews from guyueh1, yuki-97, and yfw via 898903a July 29, 2025 23:20
@ZhiyuLi-Nvidia ZhiyuLi-Nvidia force-pushed the zhiyul/yukih/refit-optimization-minimal-serialization branch from 9f29928 to 898903a Compare July 29, 2025 23:20
@github-actions
Copy link

✅ Submodule Fast-Forward Check Results

Check based on commit: 898903a (PR #686 from zhiyul/yukih/refit-optimization-minimal-serialization)

✅ Submodules that are properly updated:

NeMo: ✅ PR branch is ahead of main branch (fast-forward)

All submodule changes look good! ✨

@github-actions
Copy link

✅ Submodule Fast-Forward Check Results

Check based on commit: 19703f7 (PR #686 from zhiyul/yukih/refit-optimization-minimal-serialization)

✅ Submodules that are properly updated:

NeMo: ✅ PR branch is ahead of main branch (fast-forward)

All submodule changes look good! ✨

Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>
@ZhiyuLi-Nvidia ZhiyuLi-Nvidia force-pushed the zhiyul/yukih/refit-optimization-minimal-serialization branch from 19703f7 to 0d26765 Compare July 29, 2025 23:34
@github-actions
Copy link

✅ Submodule Fast-Forward Check Results

Check based on commit: 0d26765 (PR #686 from zhiyul/yukih/refit-optimization-minimal-serialization)

✅ Submodules that are properly updated:

NeMo: ✅ PR branch is ahead of main branch (fast-forward)

All submodule changes look good! ✨

Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>
@github-actions
Copy link

✅ Submodule Fast-Forward Check Results

Check based on commit: dd74850 (PR #686 from zhiyul/yukih/refit-optimization-minimal-serialization)

✅ Submodules that are properly updated:

NeMo: ✅ PR branch is ahead of main branch (fast-forward)

All submodule changes look good! ✨

@terrykong terrykong enabled auto-merge July 29, 2025 23:43
@terrykong terrykong added this pull request to the merge queue Jul 29, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jul 30, 2025
@terrykong terrykong added this pull request to the merge queue Jul 30, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jul 30, 2025
@terrykong terrykong added this pull request to the merge queue Jul 30, 2025
Merged via the queue into main with commit 7b3fad8 Jul 30, 2025
15 checks passed
@terrykong terrykong deleted the zhiyul/yukih/refit-optimization-minimal-serialization branch July 30, 2025 20:37
tpoisonooo pushed a commit to tpoisonooo/RL that referenced this pull request Aug 4, 2025
Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>
Signed-off-by: Zhiyu Li <zhiyul@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Co-authored-by: Yuki Huang <yukih@nvidia.com>
Co-authored-by: yuki <48991475+yuki-666@users.noreply.github.com>
Signed-off-by: tpoisonooo <khj.application@aliyun.com>
FannYYW pushed a commit to xxman-google/NeMo-RL that referenced this pull request Aug 5, 2025
Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>
Signed-off-by: Zhiyu Li <zhiyul@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Co-authored-by: Yuki Huang <yukih@nvidia.com>
Co-authored-by: yuki <48991475+yuki-666@users.noreply.github.com>
soodoshll pushed a commit to soodoshll/RL that referenced this pull request Aug 13, 2025
Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>
Signed-off-by: Zhiyu Li <zhiyul@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Co-authored-by: Yuki Huang <yukih@nvidia.com>
Co-authored-by: yuki <48991475+yuki-666@users.noreply.github.com>
Signed-off-by: Qidong Su <qidongs@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants