KEMBAR78
[AOTI][CPU] Consider bias=None case for fbgemm_linear_fp16_weight by hl475 · Pull Request #158535 · pytorch/pytorch · GitHub
Skip to content

Conversation

@hl475
Copy link
Contributor

@hl475 hl475 commented Jul 17, 2025

@pytorch-bot
Copy link

pytorch-bot bot commented Jul 17, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/158535

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 919db33 with merge base 393377d (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D78458214

@github-actions
Copy link
Contributor

Attention! native_functions.yaml was changed

If you are adding a new function or defaulted argument to native_functions.yaml, you cannot use it from pre-existing Python frontend code until our FC window passes (two weeks). Split your PR into two PRs, one which adds the new C++ functionality, and one that makes use of it from Python, and land them two weeks apart. See https://github.com/pytorch/pytorch/wiki/PyTorch's-Python-Frontend-Backward-and-Forward-Compatibility-Policy#forwards-compatibility-fc for more info.


Caused by:

@github-actions
Copy link
Contributor

Attention! PyTorch one of the C-stable API file was changed

You MUST NOT change existing function declarations in this, as this header defines a stable C ABI. If you need to change the signature for a function, introduce a new v2 version of the function and modify code generation to target the new version of the function.


Caused by:

hl475 added a commit to hl475/pytorch that referenced this pull request Jul 17, 2025
Summary: Pull Request resolved: pytorch#158535

Test Plan:
```
buck2 run mode/opt deeplearning/aot_inductor/cpu:cli -- --local-model-path ~/testing/897908634_3.input.predictor --preset foa_early_stage_ranking
```
this will reduce the number of `aten::add` from 184 to 146

Rollback Plan:

Differential Revision: D78458214
@hl475 hl475 force-pushed the export-D78458214 branch from 21fc4d7 to 93b5591 Compare July 17, 2025 04:25
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D78458214

@hl475 hl475 force-pushed the export-D78458214 branch from 93b5591 to ab6867d Compare July 17, 2025 06:28
hl475 added a commit to hl475/pytorch that referenced this pull request Jul 17, 2025
Summary: Pull Request resolved: pytorch#158535

Test Plan:
```
buck2 run mode/opt deeplearning/aot_inductor/cpu:cli -- --local-model-path ~/testing/897908634_3.input.predictor --preset foa_early_stage_ranking
```
this will reduce the number of `aten::add` from 184 to 146

Rollback Plan:

Differential Revision: D78458214
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D78458214

@hl475 hl475 force-pushed the export-D78458214 branch from ab6867d to 434eb6e Compare July 21, 2025 07:50
pytorch-bot bot pushed a commit that referenced this pull request Jul 21, 2025
Summary: Pull Request resolved: #158535

Test Plan:
# e2e
```
buck2 run mode/opt deeplearning/aot_inductor/cpu:cli -- --local-model-path ~/testing/897908634_3.input.predictor --preset foa_early_stage_ranking
```
this will reduce the number of `aten::add` from 184 to 146

# BC & FC
## pick two models where one has bias None case and one doesn't
## publish model with diff and run predictor without diff
## publish model without diff and run predictor with diff
```
manifold get ads_storage_fblearner/tree/user/facebook/fblearner/predictor/752748048/0/lowering/.predictor.local/input_model ~/testing/752748048_0.input.predictor.local
```
```
rm -rf /tmp/pt2_archive_* && buck2 run mode/opt deeplearning/aot_inductor/cpu:cli -- --local-model-path ~/testing/752748048_0.input.predictor.local --submodule merge --preset ads_second_stage_ranking_type_2 --lowering-backend aotinductor_ep
```
```
buck2 run mode/opt caffe2/torch/fb/model_transform/fx2trt/packaging:load_net_predictor -- --loadMode=Benchmark --inputNetFile /tmp/pt2_archive_merge/package.zip --moduleName=merge --submodToDevice "" --using_aoti_lowering_allowlist=false --benchmarkDontRebatchSamples
```
**failed** - publish model has bias None with diff, and run predictor without diff - P1872435999
succeed - publish model has bias None without diff, and run predictor with diff - P1872444989
```
rm -rf /tmp/pt2_archive_mix && buck2 run mode/opt deeplearning/aot_inductor/cpu:cli -- --local-model-path ~/testing/742055223_194.input.predictor.local --submodule mix --preset ads_first_stage_ranking_dsnn --lowering-backend aotinductor_ep
```
```
buck2 run mode/opt caffe2/torch/fb/model_transform/fx2trt/packaging:load_net_predictor -- --loadMode=Benchmark --inputNetFile /tmp/pt2_archive_mix/package.zip --moduleName=mix --submodToDevice "" --using_aoti_lowering_allowlist=false --benchmarkDontRebatchSamples
```
succeed - publish model hasn't bias None with diff, and run predictor without diff - P1872468850
succeed - publish model hasn't bias None without diff, and run predictor with diff - P1872474897

Rollback Plan:

Differential Revision: D78458214
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D78458214

Summary: Pull Request resolved: pytorch#158535

Test Plan:
# e2e
```
buck2 run mode/opt deeplearning/aot_inductor/cpu:cli -- --local-model-path ~/testing/897908634_3.input.predictor --preset foa_early_stage_ranking
```
this will reduce the number of `aten::add` from 184 to 146

# BC & FC
## pick two models where one has bias None case and one doesn't
## publish model with diff and run predictor without diff
## publish model without diff and run predictor with diff
```
manifold get ads_storage_fblearner/tree/user/facebook/fblearner/predictor/752748048/0/lowering/.predictor.local/input_model ~/testing/752748048_0.input.predictor.local
```
```
rm -rf /tmp/pt2_archive_* && buck2 run mode/opt deeplearning/aot_inductor/cpu:cli -- --local-model-path ~/testing/752748048_0.input.predictor.local --submodule merge --preset ads_second_stage_ranking_type_2 --lowering-backend aotinductor_ep
```
```
buck2 run mode/opt caffe2/torch/fb/model_transform/fx2trt/packaging:load_net_predictor -- --loadMode=Benchmark --inputNetFile /tmp/pt2_archive_merge/package.zip --moduleName=merge --submodToDevice "" --using_aoti_lowering_allowlist=false --benchmarkDontRebatchSamples
```
**failed** - publish model has bias None with diff, and run predictor without diff - P1872435999
succeed - publish model has bias None without diff, and run predictor with diff - P1872444989
```
rm -rf /tmp/pt2_archive_mix && buck2 run mode/opt deeplearning/aot_inductor/cpu:cli -- --local-model-path ~/testing/742055223_194.input.predictor.local --submodule mix --preset ads_first_stage_ranking_dsnn --lowering-backend aotinductor_ep
```
```
buck2 run mode/opt caffe2/torch/fb/model_transform/fx2trt/packaging:load_net_predictor -- --loadMode=Benchmark --inputNetFile /tmp/pt2_archive_mix/package.zip --moduleName=mix --submodToDevice "" --using_aoti_lowering_allowlist=false --benchmarkDontRebatchSamples
```
succeed - publish model hasn't bias None with diff, and run predictor without diff - P1872468850
succeed - publish model hasn't bias None without diff, and run predictor with diff - P1872474897

Rollback Plan:

Differential Revision: D78458214
@hl475 hl475 force-pushed the export-D78458214 branch from 434eb6e to 919db33 Compare July 21, 2025 14:36
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D78458214

@hl475 hl475 changed the title [WIP] bias None case [AOTI][CPU] Consider bias=None case for fbgemm_linear_fp16_weight Jul 21, 2025
@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jul 21, 2025
@hl475 hl475 requested a review from desertfire July 21, 2025 20:52
@hl475 hl475 requested review from houseroad and muchulee8 July 21, 2025 20:52
@facebook-github-bot
Copy link
Contributor

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

saienduri pushed a commit to saienduri/pytorch that referenced this pull request Jul 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request fb-exported Merged module: cpu CPU specific problem (e.g., perf, algorithm) module: inductor release notes: inductor (aoti) release notes: quantization release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants