-
Notifications
You must be signed in to change notification settings - Fork 158
feat: refit metadata optimization #686
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: refit metadata optimization #686
Conversation
bd680b0
to
bb491df
Compare
PR looks good to me. I noticed that the function serialization is also part of this PR. Out of the 8% perf gain, do we have a breakdown of how much can be attributed to the function serialization and how much to the metadata optimization? Just wondering how much we are actually gaining from that part. |
Also a minor thing since we are timing the refit separately now. Do you mind removing this print which spams the logs for large models? |
@yfw almost all 8% perf gain is from the function serialization. pass list of keys only during refitting This change may not offer any speed optimization but will result in cleaner and more readable code. It might be reasonable to expect little improvement from changing a dictionary (key, offset pair) to a key list (requiring local offset reconstruction) during serialization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ZhiyuLi-Nvidia , LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small comment otherwise LGTM
2025a57
to
e677b9c
Compare
Done. Could you take another look @yfw? |
e677b9c
to
2931ae2
Compare
Signed-off-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: Zhiyu Li <zhiyul@nvidia.com>
Signed-off-by: Zhiyu Li <zhiyul@nvidia.com>
Signed-off-by: Zhiyu Li <zhiyul@nvidia.com>
Signed-off-by: Zhiyu Li <zhiyul@nvidia.com>
Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>
898903a
9f29928
to
898903a
Compare
19703f7
to
0d26765
Compare
Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>
Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com> Signed-off-by: Zhiyu Li <zhiyul@nvidia.com> Signed-off-by: Yuki Huang <yukih@nvidia.com> Co-authored-by: Yuki Huang <yukih@nvidia.com> Co-authored-by: yuki <48991475+yuki-666@users.noreply.github.com> Signed-off-by: tpoisonooo <khj.application@aliyun.com>
Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com> Signed-off-by: Zhiyu Li <zhiyul@nvidia.com> Signed-off-by: Yuki Huang <yukih@nvidia.com> Co-authored-by: Yuki Huang <yukih@nvidia.com> Co-authored-by: yuki <48991475+yuki-666@users.noreply.github.com>
Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com> Signed-off-by: Zhiyu Li <zhiyul@nvidia.com> Signed-off-by: Yuki Huang <yukih@nvidia.com> Co-authored-by: Yuki Huang <yukih@nvidia.com> Co-authored-by: yuki <48991475+yuki-666@users.noreply.github.com> Signed-off-by: Qidong Su <qidongs@nvidia.com>
What does this PR do ?
fix: maintain fp32 mlp.router.expert_bias even with bf16 enabled
track refitting time inside prepare_for_generation
refit metadata optimization: pass list of keys only during refitting
benchmark refit performance 8% gain and the code would be cleaner.
Issues
List issues that this PR closes (syntax):
Usage
# Add a code snippet demonstrating how to use this
Before your PR is "Ready for review"
Pre checks:
Additional Information