-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[quant][pt2e][xnnpack] XNNPACKQuantizer skip quantization for input and output to workaround histogram observer problem #113405
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…for activation Summary: att, this is because histogram observer does not work for a corner case in mobilebert (observing a scalar tensor of float32 max value) Test Plan: python test/test_quantization.py -k test_mul_float32_max Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/113405
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 44d3195 with merge base 066ac56 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
…for activation Summary: att, this is because histogram observer does not work for a corner case in mobilebert (observing a scalar tensor of float32 max value) Test Plan: python test/test_quantization.py -k test_mul_float32_max Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 4b95529 Pull Request resolved: #113405
| act_observer_or_fake_quant_ctr = PlaceholderObserver # type: ignore[assignment] | ||
| else: | ||
| act_observer_or_fake_quant_ctr = HistogramObserver # type: ignore[assignment] | ||
| act_observer_or_fake_quant_ctr = MinMaxObserver if ptq_act_use_minmax_observer else HistogramObserver # type: ignore[assignment] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will all observers for mul then be MinMax? It seems like we are just using MinMaxObserver to get around the sys.floatmax, is there a way to use MinMaxObserver only for this case, and use HistogramObserver for the other activations?
to be fair, I don't truly understand how this will affect the numerics
cc. @kimishpatel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with the current support we'll probably just use global configuration (sets all ops to use this config)
we can try to support enabling this just for a single mul op, but we'll need to implement this in xnnpack quantizer
this change does not change any existing tests btw, it's an optional flag that allows people to use minmax observer.
for mobilebert in executorch repo, do we have accuracy goals? or is it just to make sure that the quantization and lowering works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah as I commented in another comment. I dont think this is the right way to do it. Max would you mind fixing this yourself. If HistogramObserver cannot truly handle this and HistogramObsever does not seem the right place to fix this issue, then it belong to quantizer, who should make this decision. @mcr229 if you can do this, that would be great. I dont want this to be a global config.
| act_observer_or_fake_quant_ctr = PlaceholderObserver # type: ignore[assignment] | ||
| else: | ||
| act_observer_or_fake_quant_ctr = HistogramObserver # type: ignore[assignment] | ||
| act_observer_or_fake_quant_ctr = MinMaxObserver if ptq_act_use_minmax_observer else HistogramObserver # type: ignore[assignment] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
this doenst seem like the right way to fix this. Also can you explain the underlying issue? If the scalar tensor didnt have float max, but some other value, would that work with histogram observer?
-
Orthogonally, why does histogram observer not work with single value or float max?
-
For fixing this, I would suggest that we specifically change observer only for nodes/edge, that you are trying to annotate, that has a scalar tensor. And not do that in a blanket manner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- histogram observer is using torch.histc, for scalar tensor, it only works up to some values, not to float_max, the largest that it can work is around float_max/10e15 or something like that
- probably related to histc implementation, I didn't look closely on this
- sure, I think that would require in the user side to just walk through the graph and annotate the nodes before applying global xnnpack quantizer, but this configuration is still needed I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- histogram observer is using torch.histc, for scalar tensor, it only works up to some values, not to float_max, the largest that it can work is around float_max/10e15 or something like that
Jerry, which op's quantization, via scalar to tensor pass, is triggering this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is mul/add, both triggers this I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, I think that would require in the user side to just walk through the graph and annotate the nodes before applying global xnnpack quantizer, but this configuration is still needed I think?
It shouldnt be done by user. xnnpack quantizer should be able to do this in its annotation logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to happen as per node annotation
@kimishpatel this is not a change of how the mobilebert is quantized, the main purpose for the PR is to add the configuration, we can do per node annotation when quantizing mobilebert in executorch |
…x observer for activation" Summary: att, this is because histogram observer does not work for a corner case in mobilebert (observing a scalar tensor of float32 max value) Test Plan: python test/test_quantization.py -k test_mul_float32_max Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
…for activation Summary: att, this is because histogram observer does not work for a corner case in mobilebert (observing a scalar tensor of float32 max value) Test Plan: python test/test_quantization.py -k test_mul_float32_max Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 86f6cfd Pull Request resolved: #113405
which configuration?
These dont require adding any configuration |
I see, sounds good, I can change annotate_add/annotate_mul etc. directly then, it's not just max that histc fails, as long as it's great than some large value it will fail, maybe we could just use minmax observer for all scalar values? does that sound good? |
…x observer for activation" Summary: att, this is because histogram observer does not work for a corner case in mobilebert (observing a scalar tensor of float32 max value) Test Plan: python test/test_quantization.py -k test_mul_float32_max Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
…r for input and output in some cases" Summary: att, this is because histogram observer does not work for a corner case in mobilebert (observing a scalar tensor of float32 max value) because histc operator errors out when the value is larger than certain number Test Plan: python test/test_quantization.py -k test_mul_float32_max Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
… and output in some cases Summary: att, this is because histogram observer does not work for a corner case in mobilebert (observing a scalar tensor of float32 max value) because histc operator errors out when the value is larger than certain number Test Plan: python test/test_quantization.py -k test_mul_float32_max Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 94eff22 Pull Request resolved: #113405
…r for input and output in some cases" Summary: att, this is because histogram observer does not work for a corner case in mobilebert (observing a scalar tensor of float32 max value) because histc operator errors out when the value is larger than certain number Test Plan: python test/test_quantization.py -k test_mul_float32_max Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
… and output in some cases Summary: att, this is because histogram observer does not work for a corner case in mobilebert (observing a scalar tensor of float32 max value) because histc operator errors out when the value is larger than certain number Test Plan: python test/test_quantization.py -k test_mul_float32_max Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: fb3a0a1 Pull Request resolved: #113405
…server for input and MinMaxObserver for output in some cases" Summary: att, this is because histogram observer does not work for a corner case in mobilebert (observing a scalar tensor of float32 max value) because histc operator errors out when the value is larger than certain number Test Plan: python test/test_quantization.py -k test_mul_float32_max Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
…input and MinMaxObserver for output in some cases Summary: att, this is because histogram observer does not work for a corner case in mobilebert (observing a scalar tensor of float32 max value) because histc operator errors out when the value is larger than certain number Test Plan: python test/test_quantization.py -k test_mul_float32_max Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 0cf42f4 Pull Request resolved: #113405
…server for input and MinMaxObserver for output in some cases" Summary: att, this is because histogram observer does not work for a corner case in mobilebert (observing a scalar tensor of float32 max value) because histc operator errors out when the value is larger than certain number Test Plan: python test/test_quantization.py -k test_mul_float32_max Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
…input and MinMaxObserver for output in some cases Summary: att, this is because histogram observer does not work for a corner case in mobilebert (observing a scalar tensor of float32 max value) because histc operator errors out when the value is larger than certain number Test Plan: python test/test_quantization.py -k test_mul_float32_max Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: fdce6f4 Pull Request resolved: #113405
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure I think this looks fine
…for input and output to workaround histogram observer problem" Summary: att, this is because histogram observer does not work for a corner case in mobilebert (observing a scalar tensor of float32 max value) because histc operator errors out when the value is larger than certain number Test Plan: python test/test_quantization.py -k test_mul_float32_max Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
…input and MinMaxObserver for output in some cases Summary: att, this is because histogram observer does not work for a corner case in mobilebert (observing a scalar tensor of float32 max value) because histc operator errors out when the value is larger than certain number Test Plan: python test/test_quantization.py -k test_mul_float32_max Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: c0e5216 Pull Request resolved: #113405
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…nd output to workaround histogram observer problem (pytorch#113405) Summary: att, this is because histogram observer does not work for a corner case in mobilebert (observing a scalar tensor of float32 max value) because histc operator errors out when the value is larger than certain number Test Plan: python test/test_quantization.py -k test_mul_float32_max Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: pytorch#113405 Approved by: https://github.com/mcr229
Stack from ghstack (oldest at bottom):
Summary:
att, this is because histogram observer does not work for a corner case in mobilebert (observing a scalar tensor of float32 max value)
because histc operator errors out when the value is larger than certain number
Test Plan:
python test/test_quantization.py -k test_mul_float32_max
Reviewers:
Subscribers:
Tasks:
Tags: