[quant][pt2e][xnnpack] XNNPACKQuantizer skip quantization for input and output to workaround histogram observer problem #113405

jerryzh168 · 2023-11-10T01:12:55Z

Stack from ghstack (oldest at bottom):

-> [quant][pt2e][xnnpack] XNNPACKQuantizer skip quantization for input and output to workaround histogram observer problem #113405

Summary:
att, this is because histogram observer does not work for a corner case in mobilebert (observing a scalar tensor of float32 max value)
because histc operator errors out when the value is larger than certain number

Test Plan:
python test/test_quantization.py -k test_mul_float32_max

Reviewers:

Subscribers:

Tasks:

Tags:

…for activation Summary: att, this is because histogram observer does not work for a corner case in mobilebert (observing a scalar tensor of float32 max value) Test Plan: python test/test_quantization.py -k test_mul_float32_max Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

pytorch-bot · 2023-11-10T01:12:59Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/113405

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 44d3195 with merge base 066ac56 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…for activation Summary: att, this is because histogram observer does not work for a corner case in mobilebert (observing a scalar tensor of float32 max value) Test Plan: python test/test_quantization.py -k test_mul_float32_max Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 4b95529 Pull Request resolved: #113405

mcr229 · 2023-11-10T01:33:26Z

torch/ao/quantization/quantizer/xnnpack_quantizer.py

            act_observer_or_fake_quant_ctr = PlaceholderObserver  # type: ignore[assignment]
        else:
-            act_observer_or_fake_quant_ctr = HistogramObserver  # type: ignore[assignment]
+            act_observer_or_fake_quant_ctr = MinMaxObserver if ptq_act_use_minmax_observer else HistogramObserver  # type: ignore[assignment]


Will all observers for mul then be MinMax? It seems like we are just using MinMaxObserver to get around the sys.floatmax, is there a way to use MinMaxObserver only for this case, and use HistogramObserver for the other activations?

to be fair, I don't truly understand how this will affect the numerics

cc. @kimishpatel

with the current support we'll probably just use global configuration (sets all ops to use this config)

we can try to support enabling this just for a single mul op, but we'll need to implement this in xnnpack quantizer

this change does not change any existing tests btw, it's an optional flag that allows people to use minmax observer.

for mobilebert in executorch repo, do we have accuracy goals? or is it just to make sure that the quantization and lowering works?

yeah as I commented in another comment. I dont think this is the right way to do it. Max would you mind fixing this yourself. If HistogramObserver cannot truly handle this and HistogramObsever does not seem the right place to fix this issue, then it belong to quantizer, who should make this decision. @mcr229 if you can do this, that would be great. I dont want this to be a global config.

kimishpatel · 2023-11-10T16:55:17Z

torch/ao/quantization/quantizer/xnnpack_quantizer.py

            act_observer_or_fake_quant_ctr = PlaceholderObserver  # type: ignore[assignment]
        else:
-            act_observer_or_fake_quant_ctr = HistogramObserver  # type: ignore[assignment]
+            act_observer_or_fake_quant_ctr = MinMaxObserver if ptq_act_use_minmax_observer else HistogramObserver  # type: ignore[assignment]


this doenst seem like the right way to fix this. Also can you explain the underlying issue? If the scalar tensor didnt have float max, but some other value, would that work with histogram observer?

Orthogonally, why does histogram observer not work with single value or float max?

For fixing this, I would suggest that we specifically change observer only for nodes/edge, that you are trying to annotate, that has a scalar tensor. And not do that in a blanket manner.

histogram observer is using torch.histc, for scalar tensor, it only works up to some values, not to float_max, the largest that it can work is around float_max/10e15 or something like that

probably related to histc implementation, I didn't look closely on this

sure, I think that would require in the user side to just walk through the graph and annotate the nodes before applying global xnnpack quantizer, but this configuration is still needed I think?

histogram observer is using torch.histc, for scalar tensor, it only works up to some values, not to float_max, the largest that it can work is around float_max/10e15 or something like that

Jerry, which op's quantization, via scalar to tensor pass, is triggering this?

this is mul/add, both triggers this I think

sure, I think that would require in the user side to just walk through the graph and annotate the nodes before applying global xnnpack quantizer, but this configuration is still needed I think?

It shouldnt be done by user. xnnpack quantizer should be able to do this in its annotation logic

kimishpatel

this needs to happen as per node annotation

jerryzh168 · 2023-11-10T17:08:44Z

this needs to happen as per node annotation

@kimishpatel this is not a change of how the mobilebert is quantized, the main purpose for the PR is to add the configuration, we can do per node annotation when quantizing mobilebert in executorch

…x observer for activation" Summary: att, this is because histogram observer does not work for a corner case in mobilebert (observing a scalar tensor of float32 max value) Test Plan: python test/test_quantization.py -k test_mul_float32_max Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

…for activation Summary: att, this is because histogram observer does not work for a corner case in mobilebert (observing a scalar tensor of float32 max value) Test Plan: python test/test_quantization.py -k test_mul_float32_max Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 86f6cfd Pull Request resolved: #113405

please take a look again

kimishpatel · 2023-11-15T15:41:35Z

this needs to happen as per node annotation

@kimishpatel this is not a change of how the mobilebert is quantized, the main purpose for the PR is to add the configuration, we can do per node annotation when quantizing mobilebert in executorch

which configuration? ptq_act_use_minmax_observer option? I dont think this should be an option for user to configure. It should be done by xnnpack quantizer during annotation:

Check if the tensor has single element, due to scalar to tensor, and if so extract the value out of the tensor and validate if it is max.
Alternatively during transfor_for_annotation record nodes on which other argument is scalar, and check the value of scalar, and later in annotate, using minmax obserers for those args.

These dont require adding any configuration

jerryzh168 · 2023-11-15T18:31:07Z

this needs to happen as per node annotation

@kimishpatel this is not a change of how the mobilebert is quantized, the main purpose for the PR is to add the configuration, we can do per node annotation when quantizing mobilebert in executorch

which configuration? ptq_act_use_minmax_observer option? I dont think this should be an option for user to configure. It should be done by xnnpack quantizer during annotation:

Check if the tensor has single element, due to scalar to tensor, and if so extract the value out of the tensor and validate if it is max.

Alternatively during transfor_for_annotation record nodes on which other argument is scalar, and check the value of scalar, and later in annotate, using minmax obserers for those args.

These dont require adding any configuration

I see, sounds good, I can change annotate_add/annotate_mul etc. directly then, it's not just max that histc fails, as long as it's great than some large value it will fail, maybe we could just use minmax observer for all scalar values? does that sound good?

…x observer for activation" Summary: att, this is because histogram observer does not work for a corner case in mobilebert (observing a scalar tensor of float32 max value) Test Plan: python test/test_quantization.py -k test_mul_float32_max Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

…r for input and output in some cases" Summary: att, this is because histogram observer does not work for a corner case in mobilebert (observing a scalar tensor of float32 max value) because histc operator errors out when the value is larger than certain number Test Plan: python test/test_quantization.py -k test_mul_float32_max Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

… and output in some cases Summary: att, this is because histogram observer does not work for a corner case in mobilebert (observing a scalar tensor of float32 max value) because histc operator errors out when the value is larger than certain number Test Plan: python test/test_quantization.py -k test_mul_float32_max Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 94eff22 Pull Request resolved: #113405

…r for input and output in some cases" Summary: att, this is because histogram observer does not work for a corner case in mobilebert (observing a scalar tensor of float32 max value) because histc operator errors out when the value is larger than certain number Test Plan: python test/test_quantization.py -k test_mul_float32_max Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

… and output in some cases Summary: att, this is because histogram observer does not work for a corner case in mobilebert (observing a scalar tensor of float32 max value) because histc operator errors out when the value is larger than certain number Test Plan: python test/test_quantization.py -k test_mul_float32_max Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: fb3a0a1 Pull Request resolved: #113405

…server for input and MinMaxObserver for output in some cases" Summary: att, this is because histogram observer does not work for a corner case in mobilebert (observing a scalar tensor of float32 max value) because histc operator errors out when the value is larger than certain number Test Plan: python test/test_quantization.py -k test_mul_float32_max Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

…input and MinMaxObserver for output in some cases Summary: att, this is because histogram observer does not work for a corner case in mobilebert (observing a scalar tensor of float32 max value) because histc operator errors out when the value is larger than certain number Test Plan: python test/test_quantization.py -k test_mul_float32_max Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 0cf42f4 Pull Request resolved: #113405

…server for input and MinMaxObserver for output in some cases" Summary: att, this is because histogram observer does not work for a corner case in mobilebert (observing a scalar tensor of float32 max value) because histc operator errors out when the value is larger than certain number Test Plan: python test/test_quantization.py -k test_mul_float32_max Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

…input and MinMaxObserver for output in some cases Summary: att, this is because histogram observer does not work for a corner case in mobilebert (observing a scalar tensor of float32 max value) because histc operator errors out when the value is larger than certain number Test Plan: python test/test_quantization.py -k test_mul_float32_max Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: fdce6f4 Pull Request resolved: #113405

mcr229

Sure I think this looks fine

…for input and output to workaround histogram observer problem" Summary: att, this is because histogram observer does not work for a corner case in mobilebert (observing a scalar tensor of float32 max value) because histc operator errors out when the value is larger than certain number Test Plan: python test/test_quantization.py -k test_mul_float32_max Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

…input and MinMaxObserver for output in some cases Summary: att, this is because histogram observer does not work for a corner case in mobilebert (observing a scalar tensor of float32 max value) because histc operator errors out when the value is larger than certain number Test Plan: python test/test_quantization.py -k test_mul_float32_max Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: c0e5216 Pull Request resolved: #113405

jerryzh168 · 2023-12-01T22:13:49Z

@pytorchbot merge

pytorchmergebot · 2023-12-01T22:16:02Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…nd output to workaround histogram observer problem (pytorch#113405) Summary: att, this is because histogram observer does not work for a corner case in mobilebert (observing a scalar tensor of float32 max value) because histc operator errors out when the value is larger than certain number Test Plan: python test/test_quantization.py -k test_mul_float32_max Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: pytorch#113405 Approved by: https://github.com/mcr229

pytorch-bot bot added release notes: quantization release notes category labels Nov 10, 2023

jerryzh168 requested review from digantdesai, kimishpatel and mcr229 November 10, 2023 01:13

mcr229 reviewed Nov 10, 2023

View reviewed changes

kimishpatel reviewed Nov 10, 2023

View reviewed changes

kimishpatel previously requested changes Nov 10, 2023

View reviewed changes

jerryzh168 requested review from kimishpatel and mcr229 November 13, 2023 22:14

jerryzh168 mentioned this pull request Nov 15, 2023

[Qualcomm] Enable mobilebert quant example pytorch/executorch#1043

Closed

jerryzh168 changed the title ~~[quant][pt2e][xnnpack] Allow XNNPACKQuantizer to use minmax observer for activation~~ [quant][pt2e][xnnpack] XNNPACKQuantizer use minmax observer for input and output in some cases Nov 16, 2023

jerryzh168 changed the title ~~[quant][pt2e][xnnpack] XNNPACKQuantizer use minmax observer for input and output in some cases~~ [quant][pt2e][xnnpack] XNNPACKQuantizer use FixedQParamsObserver for input and MinMaxObserver for output in some cases Nov 18, 2023

jerryzh168 changed the title ~~[quant][pt2e][xnnpack] XNNPACKQuantizer use FixedQParamsObserver for input and MinMaxObserver for output in some cases~~ [quant][pt2e][xnnpack] XNNPACKQuantizer skip quantization for input and output to workaround histogram observer problem Dec 1, 2023

mcr229 approved these changes Dec 1, 2023

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 1, 2023

pytorchmergebot added the merging label Dec 1, 2023

pytorchmergebot added the Merged label Dec 2, 2023

pytorchmergebot closed this in 8f16401 Dec 2, 2023

pytorchmergebot removed the merging label Dec 2, 2023

facebook-github-bot deleted the gh/jerryzh168/932/head branch December 5, 2023 15:28

[quant][pt2e][xnnpack] XNNPACKQuantizer skip quantization for input and output to workaround histogram observer problem #113405

[quant][pt2e][xnnpack] XNNPACKQuantizer skip quantization for input and output to workaround histogram observer problem #113405

Uh oh!

Conversation

jerryzh168 commented Nov 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Nov 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/113405

✅ No Failures

Uh oh!

mcr229 Nov 10, 2023

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Nov 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kimishpatel Nov 10, 2023

Choose a reason for hiding this comment

Uh oh!

kimishpatel Nov 10, 2023

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Nov 10, 2023

Choose a reason for hiding this comment

Uh oh!

kimishpatel Nov 13, 2023

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Nov 13, 2023

Choose a reason for hiding this comment

Uh oh!

kimishpatel Nov 15, 2023

Choose a reason for hiding this comment

Uh oh!

kimishpatel left a comment

Choose a reason for hiding this comment

Uh oh!

jerryzh168 commented Nov 10, 2023

Uh oh!

kimishpatel commented Nov 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jerryzh168 commented Nov 15, 2023

Uh oh!

mcr229 left a comment

Choose a reason for hiding this comment

Uh oh!

jerryzh168 commented Dec 1, 2023

Uh oh!

pytorchmergebot commented Dec 1, 2023

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jerryzh168 commented Nov 10, 2023 •

edited

Loading

pytorch-bot bot commented Nov 10, 2023 •

edited

Loading

jerryzh168 Nov 10, 2023 •

edited

Loading

kimishpatel commented Nov 15, 2023 •

edited

Loading