[quant][pt2e] Enable constant folding for quantize ops #109343

jerryzh168 · 2023-09-15T01:00:10Z

Stack from ghstack (oldest at bottom):

Summary:
This PR added constant folding for quantize ops so that instead of storing fp32 weight in the
quantized model, we'll get int8/int16 etc. weight

Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_fold_quantize

also will verify in executorch later

Reviewers:

Subscribers:

Tasks:

Tags:

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov @ColinPeppler @ngimel

Differential Revision: D49399210

Summary: This PR added constant folding for quantize ops so that instead of storing fp32 weight in the quantized model, we'll get int8/int16 etc. weight Test Plan: python test/test_quantization.py TestQuantizePT2E.test_fold_quantize also will verify in executorch later Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

pytorch-bot · 2023-09-15T01:00:13Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/109343

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 33ef51f with merge base 3262c53 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Summary: This PR added constant folding for quantize ops so that instead of storing fp32 weight in the quantized model, we'll get int8/int16 etc. weight Test Plan: python test/test_quantization.py TestQuantizePT2E.test_fold_quantize also will verify in executorch later Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: c90f8fe Pull Request resolved: #109343

torch/ao/quantization/quantize_pt2e.py

kimishpatel

I think we should const fold only things related to quant workflow. Plus inductor specific utils does not seem the right thing to use. We should probably factor those out if we want to use them here

Summary: This PR added constant folding for quantize ops so that instead of storing fp32 weight in the quantized model, we'll get int8/int16 etc. weight Test Plan: python test/test_quantization.py TestQuantizePT2E.test_fold_quantize also will verify in executorch later Reviewers: Subscribers: Tasks: Tags: cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]

jerryzh168 · 2023-09-15T22:52:21Z

@kimishpatel I saw metadata for the fused get_attr node is gone right now, but what should it be? e.g. in cases like (get_attr(weight) -> transpose -> quantize)

Summary: This PR added constant folding for quantize ops so that instead of storing fp32 weight in the quantized model, we'll get int8/int16 etc. weight Test Plan: python test/test_quantization.py TestQuantizePT2E.test_fold_quantize also will verify in executorch later Reviewers: Subscribers: Tasks: Tags: cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]

Summary: This PR added constant folding for quantize ops so that instead of storing fp32 weight in the quantized model, we'll get int8/int16 etc. weight Test Plan: python test/test_quantization.py TestQuantizePT2E.test_fold_quantize also will verify in executorch later Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: ff02844 Pull Request resolved: #109343

kimishpatel · 2023-09-18T14:35:34Z

@kimishpatel I saw metadata for the fused get_attr node is gone right now, but what should it be? e.g. in cases like (get_attr(weight) -> transpose -> quantize)

For the most part i would assume that it is same getween get_attr and tranpose. But I imagine porting metadata from tranpose node should be ok for now. Just add checks to ensure that all nodes between get_attr and quantize have the same source_fn and nn_module_stack, just so that we can catch such cases, when they arise, and think of the fix we need.

jerryzh168 · 2023-09-19T01:00:47Z

@jerryzh168 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Summary: This PR added constant folding for quantize ops so that instead of storing fp32 weight in the quantized model, we'll get int8/int16 etc. weight Test Plan: python test/test_quantization.py TestQuantizePT2E.test_fold_quantize also will verify in executorch later Reviewers: Subscribers: Tasks: Tags: cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov Differential Revision: [D49399210](https://our.internmc.facebook.com/intern/diff/D49399210) [ghstack-poisoned]

Summary: This PR added constant folding for quantize ops so that instead of storing fp32 weight in the quantized model, we'll get int8/int16 etc. weight Test Plan: python test/test_quantization.py TestQuantizePT2E.test_fold_quantize also will verify in executorch later Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: f54f6cf Pull Request resolved: #109343

torch/ao/quantization/quantize_pt2e.py

torch/_inductor/constant_folding.py

pytorchmergebot · 2023-09-26T05:49:25Z

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / win-vs2019-cpu-py3 / test (default, 3, 3, windows.4xlarge.nonephemeral)

Details for Dev Infra team

Raised by workflow job

Summary: This PR added constant folding for quantize ops so that instead of storing fp32 weight in the quantized model, we'll get int8/int16 etc. weight Test Plan: python test/test_quantization.py TestQuantizePT2E.test_fold_quantize also will verify in executorch later Reviewers: Subscribers: Tasks: Tags: cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov Differential Revision: [D49399210](https://our.internmc.facebook.com/intern/diff/D49399210) [ghstack-poisoned]

Summary: This PR added constant folding for quantize ops so that instead of storing fp32 weight in the quantized model, we'll get int8/int16 etc. weight Not enabled by default to preserve BC Test Plan: python test/test_quantization.py TestQuantizePT2E.test_fold_quantize also will verify in executorch later Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: f53b89f Pull Request resolved: #109343

jerryzh168 · 2023-09-26T16:21:57Z

@jerryzh168 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Summary: This PR added constant folding for quantize ops so that instead of storing fp32 weight in the quantized model, we'll get int8/int16 etc. weight Test Plan: python test/test_quantization.py TestQuantizePT2E.test_fold_quantize also will verify in executorch later Reviewers: Subscribers: Tasks: Tags: cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov Differential Revision: [D49399210](https://our.internmc.facebook.com/intern/diff/D49399210) [ghstack-poisoned]

jerryzh168 · 2023-09-26T19:41:28Z

@jerryzh168 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

jerryzh168 · 2023-09-26T20:55:44Z

@pytorchbot merge

pytorchmergebot · 2023-09-26T20:57:33Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-09-26T20:57:39Z

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

pull / linux-focal-py3.8-clang10 / test (dynamo, 2, 2, linux.2xlarge)

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

Summary: This PR added constant folding for quantize ops so that instead of storing fp32 weight in the quantized model, we'll get int8/int16 etc. weight Test Plan: python test/test_quantization.py TestQuantizePT2E.test_fold_quantize also will verify in executorch later Reviewers: Subscribers: Tasks: Tags: cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov Differential Revision: [D49399210](https://our.internmc.facebook.com/intern/diff/D49399210) [ghstack-poisoned]

jerryzh168 · 2023-09-27T00:38:01Z

@jerryzh168 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Summary: This PR added constant folding for quantize ops so that instead of storing fp32 weight in the quantized model, we'll get int8/int16 etc. weight Test Plan: python test/test_quantization.py TestQuantizePT2E.test_fold_quantize also will verify in executorch later Reviewers: Subscribers: Tasks: Tags: cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov Differential Revision: [D49399210](https://our.internmc.facebook.com/intern/diff/D49399210) [ghstack-poisoned]

jerryzh168 · 2023-09-27T04:30:29Z

@pytorchbot merge

jerryzh168 · 2023-09-27T04:31:08Z

@jerryzh168 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

pytorchmergebot · 2023-09-27T04:32:20Z

Merge failed

Reason: This PR has internal changes and must be landed via Phabricator

Details for Dev Infra team

Raised by workflow job

jerryzh168 · 2023-09-27T04:42:08Z

@pytorchbot merge

pytorchmergebot · 2023-09-27T04:43:53Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorch-bot bot added the release notes: quantization release notes category label Sep 15, 2023

jerryzh168 mentioned this pull request Sep 15, 2023

[quant][be] Remove unused APIs #109342

Closed

github-actions bot added module: inductor ciflow/inductor labels Sep 15, 2023

jerryzh168 removed the module: inductor label Sep 15, 2023

jerryzh168 requested review from andrewor14, kimishpatel and leslie-fang-intel and removed request for andrewor14 September 15, 2023 01:00

leslie-fang-intel requested review from Xia-Weiwen and jgong5 September 15, 2023 01:03

jerryzh168 requested a review from eellison September 15, 2023 01:04

kimishpatel reviewed Sep 15, 2023

View reviewed changes

torch/ao/quantization/quantize_pt2e.py Outdated Show resolved Hide resolved

kimishpatel requested changes Sep 15, 2023

View reviewed changes

github-actions bot added the module: inductor label Sep 15, 2023

kimishpatel reviewed Sep 20, 2023

View reviewed changes

torch/ao/quantization/quantize_pt2e.py Show resolved Hide resolved

kimishpatel reviewed Sep 20, 2023

View reviewed changes

torch/_inductor/constant_folding.py Outdated Show resolved Hide resolved

pytorchmergebot removed the merging label Sep 26, 2023

jerryzh168 mentioned this pull request Sep 26, 2023

[quant][docs] Add documentation for prepare_pt2e, prepare_qat_pt2e and convert_pt2e #110097

Closed

pytorchmergebot added the merging label Sep 26, 2023

pytorchmergebot removed the merging label Sep 26, 2023

jerryzh168 mentioned this pull request Sep 26, 2023

[quant][pt2e] Add quant API re-entrant test #110125

Closed

pytorchmergebot added the merging label Sep 27, 2023

pytorchmergebot removed the merging label Sep 27, 2023

pytorchmergebot added the merging label Sep 27, 2023

pytorchmergebot added Merged and removed merging labels Sep 27, 2023

pytorchmergebot closed this in 1b51d29 Sep 27, 2023

facebook-github-bot deleted the gh/jerryzh168/919/head branch September 30, 2023 14:22

[quant][pt2e] Enable constant folding for quantize ops #109343

[quant][pt2e] Enable constant folding for quantize ops #109343

Uh oh!

Conversation

jerryzh168 commented Sep 15, 2023 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/109343

✅ No Failures

Uh oh!

Uh oh!

kimishpatel left a comment

Choose a reason for hiding this comment

Uh oh!

jerryzh168 commented Sep 15, 2023

Uh oh!

kimishpatel commented Sep 18, 2023

Uh oh!

jerryzh168 commented Sep 19, 2023

Uh oh!

Uh oh!

Uh oh!

pytorchmergebot commented Sep 26, 2023

Merge failed

Uh oh!

jerryzh168 commented Sep 26, 2023

Uh oh!

jerryzh168 commented Sep 26, 2023

Uh oh!

jerryzh168 commented Sep 26, 2023

Uh oh!

pytorchmergebot commented Sep 26, 2023

Merge started

Uh oh!

pytorchmergebot commented Sep 26, 2023

Merge failed

Uh oh!

jerryzh168 commented Sep 27, 2023

Uh oh!

jerryzh168 commented Sep 27, 2023

Uh oh!

jerryzh168 commented Sep 27, 2023

Uh oh!

pytorchmergebot commented Sep 27, 2023

Merge failed

Uh oh!

jerryzh168 commented Sep 27, 2023

Uh oh!

pytorchmergebot commented Sep 27, 2023

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

jerryzh168 commented Sep 15, 2023 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Sep 15, 2023 •

edited

Loading