[Quant][PT2E][X86] annotate and convert for linear_dynamic_fp16 #141480

Xia-Weiwen · 2024-11-25T07:44:15Z

Stack from ghstack (oldest at bottom):

Annotate linear node for linear_dynamic_fp16 with X86InductorQuantizer
After convert_pt2e, the pattern will be

  x
  |
linear <- to_fp32 <- to_fp16 <- w

Test plan

pytest test/quantization/pt2e/test_x86inductor_quantizer.py -k test_linear_dynamic_fp16

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @ezyang @SherlockNoMad @EikanWang @wenzhe-nrv

[ghstack-poisoned]

pytorch-bot · 2024-11-25T07:44:18Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/141480

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 1c827ef with merge base 2398e75 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]

ghstack-source-id: c71bf2a Pull Request resolved: #141480

[ghstack-poisoned]

torch/ao/quantization/quantizer/x86_inductor_quantizer.py

test/quantization/pt2e/test_x86inductor_quantizer.py

torch/ao/quantization/quantizer/x86_inductor_quantizer.py

leslie-fang-intel · 2024-11-26T12:12:17Z

torch/ao/quantization/fx/convert.py

            graph.erase_node(node)
    elif dtype == torch.float16:
-        raise NotImplementedError("decomposed to float16 op not implemented yet")
+        quantize_op = torch.ops.quantized_decomposed.quantize_per_tensor.default


It seems for dtype==torch.float16, torch.ops.quantized_decomposed.quantize_per_tensor.default has same semantic as to(dtype=torch.float16). Then why not just use to(dtype=torch.float16)?

Because to will be constant-folded and the pattern will be hard to match.

I also feel using to might be better, that way we won't have multiple ops doing the same thing, wondering what is needed to use to here

Unfortunately, to will be folded in Inductor. Here is the implementation with to:
https://github.com/pytorch/pytorch/blob/76ad4bb890b66098672e1f0349e83e28f2d6d85e/torch/ao/quantization/fx/convert.py#L345C1-L357C1
The patter I got in Inductor is x/w -> linear, no to is seen.
If we insert quant/dequant, we will see dequant op in Inductor, pattern: x / (w -> dequant) -> linear

I'm wondering if we can create a variant of: torch.ops.prims.convert_element_type.default

let's say torch.ops.prims.convert_element_type.no_fuse and add the op to this list

pytorch/torch/_inductor/constant_folding.py

Line 132 in fd553b9

torch.ops.quantized_decomposed.dequantize_per_channel.default,

so it's not fused by inductor?

@jerryzh168 Thanks for the suggestion. Do you know how to add a new variant?

should be the same as adding a new custom op:

pytorch/torch/ao/quantization/fx/_decomposed.py

Line 50 in ca9bfa1

@impl(quantized_decomposed_lib, "quantize_per_tensor", "CompositeExplicitAutograd")

I see. So we are actually adding a new op in the quantized_decomposed namespace?

yeah you can do that, I think ideally in torch.ops.prims but not sure the process there, I can check

Sure. I have updated the PR. Please take a look. Thanks.

leslie-fang-intel · 2024-11-26T12:22:39Z

torch/ao/quantization/quantizer/x86_inductor_quantizer.py

-                warnings.warn(
-                    "Mixed dynamic and static quantization config is not supported."
-                )
-                need_skip = True


Why we need to remove this code?

Because we always set is_dynamic=False for linear_dynamic_fp16 but it needs to work with dynamic quantization of other ops. Besides, I didn't see an issue if users use mixed static/dynamic quantization for different ops.

Then I feels this flag:

pytorch/torch/ao/quantization/quantizer/x86_inductor_quantizer.py

Line 387 in 5212ec3

dynamic_state: Optional[bool]

is not needed any more, could we just remove it? cc @yiliu30

Thanks. I have reverted this change because this is not an issue for now

test/quantization/pt2e/test_x86inductor_quantizer.py

jgong5

I have the same comment as @leslie-fang-intel . It seems simpler to just insert to(dtype=torch.half) instead of "quantize" in the graph.

Xia-Weiwen · 2024-11-27T02:41:54Z

I have the same comment as @leslie-fang-intel . It seems simpler to just insert to(dtype=torch.half) instead of "quantize" in the graph.

Unfortunately, to will be constant-folded and make the pattern difficult to match. So, we need to insert quant & dequant on the graph for pattern match.

[ghstack-poisoned]

torch/ao/quantization/quantizer/x86_inductor_quantizer.py

[ghstack-poisoned]

torch/ao/quantization/quantizer/x86_inductor_quantizer.py

[ghstack-poisoned]

jerryzh168 · 2024-11-28T06:00:29Z

torch/ao/quantization/fx/_decomposed.py

+
+
+quantized_decomposed_lib.define(
+    "convert_element_type.no_fuse(Tensor input, ScalarType dtype) -> Tensor"


do you need to add this op to

pytorch/torch/_inductor/constant_folding.py

Line 132 in fd553b9

torch.ops.quantized_decomposed.dequantize_per_channel.default,

so it's not constant folded?

I have added this in the next PR. Thanks

torch/ao/quantization/fx/_decomposed.py

jerryzh168

LG, thanks!

[ghstack-poisoned]

jerryzh168 · 2024-11-29T04:57:42Z

@pytorchbot merge

pytorchmergebot · 2024-11-29T04:59:42Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…rch#141480) Annotate linear node for `linear_dynamic_fp16` with `X86InductorQuantizer` After `convert_pt2e`, the pattern will be ``` x | linear <- to_fp32 <- to_fp16 <- w ``` **Test plan** ``` pytest test/quantization/pt2e/test_x86inductor_quantizer.py -k test_linear_dynamic_fp16 ``` Pull Request resolved: pytorch#141480 Approved by: https://github.com/jgong5, https://github.com/jerryzh168

ghstack-source-id: fbdfaa7 Pull Request resolved: pytorch/pytorch#141480

Update

76ad4bb

[ghstack-poisoned]

Xia-Weiwen requested a review from jerryzh168 as a code owner November 25, 2024 07:44

pytorch-bot bot added the release notes: quantization release notes category label Nov 25, 2024

Xia-Weiwen marked this pull request as draft November 25, 2024 07:44

pytorchbot added the open source label Nov 25, 2024

facebook-github-bot added the fx label Nov 25, 2024

Update

be0ea17

[ghstack-poisoned]

Xia-Weiwen added a commit that referenced this pull request Nov 25, 2024

[Quant][PT2E] annotate and convert for linear_dynamic_fp16

1d2871b

ghstack-source-id: c71bf2a Pull Request resolved: #141480

Xia-Weiwen requested review from jgong5 and leslie-fang-intel and removed request for jgong5 and leslie-fang-intel November 26, 2024 01:12

Update

90ae675

[ghstack-poisoned]

Update

7756383

[ghstack-poisoned]

Xia-Weiwen mentioned this pull request Nov 26, 2024

[Quant][Inductor][X86] add fusion pass for linear_dynamic_fp16 #141549

Closed

Xia-Weiwen added the intel This tag is for PR from Intel label Nov 26, 2024

Xia-Weiwen mentioned this pull request Nov 26, 2024

[Quant][Inductor][X86] add fusion pass for linear_dynamic_fp16 with relu #141556

Closed

Xia-Weiwen changed the title ~~[Quant][PT2E] annotate and convert for linear_dynamic_fp16~~ [Quant][PT2E][CPU] annotate and convert for linear_dynamic_fp16 Nov 26, 2024

Xia-Weiwen changed the title ~~[Quant][PT2E][CPU] annotate and convert for linear_dynamic_fp16~~ [Quant][PT2E][X86] annotate and convert for linear_dynamic_fp16 Nov 26, 2024

Xia-Weiwen requested review from jgong5 and leslie-fang-intel November 26, 2024 11:46

leslie-fang-intel reviewed Nov 26, 2024

View reviewed changes

test/quantization/pt2e/test_x86inductor_quantizer.py Show resolved Hide resolved

Xia-Weiwen requested a review from leslie-fang-intel November 27, 2024 01:35

jgong5 requested changes Nov 27, 2024

View reviewed changes

Xia-Weiwen requested a review from jgong5 November 27, 2024 02:42

Update

d9c2e14

[ghstack-poisoned]

jerryzh168 reviewed Nov 27, 2024

View reviewed changes

torch/ao/quantization/quantizer/x86_inductor_quantizer.py Outdated Show resolved Hide resolved

jerryzh168 reviewed Nov 27, 2024

View reviewed changes

torch/ao/quantization/quantizer/x86_inductor_quantizer.py Outdated Show resolved Hide resolved

jerryzh168 reviewed Nov 27, 2024

View reviewed changes

torch/ao/quantization/quantizer/x86_inductor_quantizer.py Outdated Show resolved Hide resolved

jgong5 approved these changes Nov 27, 2024

View reviewed changes

Update

8cf001d

[ghstack-poisoned]

Xia-Weiwen requested a review from jerryzh168 November 27, 2024 08:43

Xia-Weiwen marked this pull request as ready for review November 27, 2024 08:43

jerryzh168 reviewed Nov 28, 2024

View reviewed changes

torch/ao/quantization/quantizer/x86_inductor_quantizer.py Show resolved Hide resolved

Update

5ec0a96

[ghstack-poisoned]

Xia-Weiwen requested a review from jerryzh168 November 28, 2024 03:32

jerryzh168 reviewed Nov 28, 2024

View reviewed changes

torch/ao/quantization/fx/_decomposed.py Outdated Show resolved Hide resolved

jerryzh168 approved these changes Nov 28, 2024

View reviewed changes

Update

1c827ef

[ghstack-poisoned]

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 29, 2024

pytorchmergebot added the merging label Nov 29, 2024

pytorchmergebot added the Merged label Nov 29, 2024

pytorchmergebot closed this in 9827d67 Nov 29, 2024

pytorchmergebot removed the merging label Nov 29, 2024

Esquains pushed a commit to Esquains/study1 that referenced this pull request Dec 15, 2024

[Quant][PT2E][X86] annotate and convert for linear_dynamic_fp16

a3d8340

ghstack-source-id: fbdfaa7 Pull Request resolved: pytorch/pytorch#141480

github-actions bot deleted the gh/Xia-Weiwen/20/head branch December 30, 2024 02:07



		quantized_decomposed_lib.define(
		"convert_element_type.no_fuse(Tensor input, ScalarType dtype) -> Tensor"

[Quant][PT2E][X86] annotate and convert for linear_dynamic_fp16 #141480

[Quant][PT2E][X86] annotate and convert for linear_dynamic_fp16 #141480

Uh oh!

Conversation

Xia-Weiwen commented Nov 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Nov 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/141480

✅ No Failures

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Nov 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Xia-Weiwen Nov 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jgong5 left a comment

Choose a reason for hiding this comment

Uh oh!

Xia-Weiwen commented Nov 27, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jerryzh168 left a comment

Choose a reason for hiding this comment

Uh oh!

jerryzh168 commented Nov 29, 2024

Uh oh!

pytorchmergebot commented Nov 29, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Xia-Weiwen commented Nov 25, 2024 •

edited

Loading

pytorch-bot bot commented Nov 25, 2024 •

edited

Loading

jerryzh168 Nov 27, 2024 •

edited

Loading

Xia-Weiwen Nov 27, 2024 •

edited

Loading