KEMBAR78
[quant][pt2e][xnnpack] XNNPACKQuantizer skip quantization for input and output to workaround histogram observer problem by jerryzh168 · Pull Request #113405 · pytorch/pytorch · GitHub
Skip to content

Conversation

@jerryzh168
Copy link
Contributor

@jerryzh168 jerryzh168 commented Nov 10, 2023

Stack from ghstack (oldest at bottom):

Summary:
att, this is because histogram observer does not work for a corner case in mobilebert (observing a scalar tensor of float32 max value)
because histc operator errors out when the value is larger than certain number

Test Plan:
python test/test_quantization.py -k test_mul_float32_max

Reviewers:

Subscribers:

Tasks:

Tags:

…for activation

Summary:
att, this is because histogram observer does not work for a corner case in mobilebert (observing a scalar tensor of float32 max value)

Test Plan:
python test/test_quantization.py -k test_mul_float32_max

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
@pytorch-bot pytorch-bot bot added release notes: quantization release notes category labels Nov 10, 2023
@pytorch-bot
Copy link

pytorch-bot bot commented Nov 10, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/113405

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 44d3195 with merge base 066ac56 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jerryzh168 added a commit that referenced this pull request Nov 10, 2023
…for activation

Summary:
att, this is because histogram observer does not work for a corner case in mobilebert (observing a scalar tensor of float32 max value)

Test Plan:
python test/test_quantization.py -k test_mul_float32_max

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 4b95529
Pull Request resolved: #113405
act_observer_or_fake_quant_ctr = PlaceholderObserver # type: ignore[assignment]
else:
act_observer_or_fake_quant_ctr = HistogramObserver # type: ignore[assignment]
act_observer_or_fake_quant_ctr = MinMaxObserver if ptq_act_use_minmax_observer else HistogramObserver # type: ignore[assignment]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will all observers for mul then be MinMax? It seems like we are just using MinMaxObserver to get around the sys.floatmax, is there a way to use MinMaxObserver only for this case, and use HistogramObserver for the other activations?

to be fair, I don't truly understand how this will affect the numerics

cc. @kimishpatel

Copy link
Contributor Author

@jerryzh168 jerryzh168 Nov 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with the current support we'll probably just use global configuration (sets all ops to use this config)

we can try to support enabling this just for a single mul op, but we'll need to implement this in xnnpack quantizer

this change does not change any existing tests btw, it's an optional flag that allows people to use minmax observer.

for mobilebert in executorch repo, do we have accuracy goals? or is it just to make sure that the quantization and lowering works?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah as I commented in another comment. I dont think this is the right way to do it. Max would you mind fixing this yourself. If HistogramObserver cannot truly handle this and HistogramObsever does not seem the right place to fix this issue, then it belong to quantizer, who should make this decision. @mcr229 if you can do this, that would be great. I dont want this to be a global config.

act_observer_or_fake_quant_ctr = PlaceholderObserver # type: ignore[assignment]
else:
act_observer_or_fake_quant_ctr = HistogramObserver # type: ignore[assignment]
act_observer_or_fake_quant_ctr = MinMaxObserver if ptq_act_use_minmax_observer else HistogramObserver # type: ignore[assignment]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • this doenst seem like the right way to fix this. Also can you explain the underlying issue? If the scalar tensor didnt have float max, but some other value, would that work with histogram observer?

  • Orthogonally, why does histogram observer not work with single value or float max?

  • For fixing this, I would suggest that we specifically change observer only for nodes/edge, that you are trying to annotate, that has a scalar tensor. And not do that in a blanket manner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • histogram observer is using torch.histc, for scalar tensor, it only works up to some values, not to float_max, the largest that it can work is around float_max/10e15 or something like that
  • probably related to histc implementation, I didn't look closely on this
  • sure, I think that would require in the user side to just walk through the graph and annotate the nodes before applying global xnnpack quantizer, but this configuration is still needed I think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • histogram observer is using torch.histc, for scalar tensor, it only works up to some values, not to float_max, the largest that it can work is around float_max/10e15 or something like that

Jerry, which op's quantization, via scalar to tensor pass, is triggering this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is mul/add, both triggers this I think

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, I think that would require in the user side to just walk through the graph and annotate the nodes before applying global xnnpack quantizer, but this configuration is still needed I think?

It shouldnt be done by user. xnnpack quantizer should be able to do this in its annotation logic

Copy link
Contributor

@kimishpatel kimishpatel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to happen as per node annotation

@jerryzh168
Copy link
Contributor Author

this needs to happen as per node annotation

@kimishpatel this is not a change of how the mobilebert is quantized, the main purpose for the PR is to add the configuration, we can do per node annotation when quantizing mobilebert in executorch

…x observer for activation"

Summary:
att, this is because histogram observer does not work for a corner case in mobilebert (observing a scalar tensor of float32 max value)

Test Plan:
python test/test_quantization.py -k test_mul_float32_max

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
jerryzh168 added a commit that referenced this pull request Nov 13, 2023
…for activation

Summary:
att, this is because histogram observer does not work for a corner case in mobilebert (observing a scalar tensor of float32 max value)

Test Plan:
python test/test_quantization.py -k test_mul_float32_max

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 86f6cfd
Pull Request resolved: #113405
@jerryzh168 jerryzh168 dismissed kimishpatel’s stale review November 13, 2023 22:14

please take a look again

@kimishpatel
Copy link
Contributor

kimishpatel commented Nov 15, 2023

this needs to happen as per node annotation

@kimishpatel this is not a change of how the mobilebert is quantized, the main purpose for the PR is to add the configuration, we can do per node annotation when quantizing mobilebert in executorch

which configuration? ptq_act_use_minmax_observer option? I dont think this should be an option for user to configure. It should be done by xnnpack quantizer during annotation:

  • Check if the tensor has single element, due to scalar to tensor, and if so extract the value out of the tensor and validate if it is max.
  • Alternatively during transfor_for_annotation record nodes on which other argument is scalar, and check the value of scalar, and later in annotate, using minmax obserers for those args.

These dont require adding any configuration

@jerryzh168
Copy link
Contributor Author

this needs to happen as per node annotation

@kimishpatel this is not a change of how the mobilebert is quantized, the main purpose for the PR is to add the configuration, we can do per node annotation when quantizing mobilebert in executorch

which configuration? ptq_act_use_minmax_observer option? I dont think this should be an option for user to configure. It should be done by xnnpack quantizer during annotation:

  • Check if the tensor has single element, due to scalar to tensor, and if so extract the value out of the tensor and validate if it is max.
  • Alternatively during transfor_for_annotation record nodes on which other argument is scalar, and check the value of scalar, and later in annotate, using minmax obserers for those args.

These dont require adding any configuration

I see, sounds good, I can change annotate_add/annotate_mul etc. directly then, it's not just max that histc fails, as long as it's great than some large value it will fail, maybe we could just use minmax observer for all scalar values? does that sound good?

…x observer for activation"

Summary:
att, this is because histogram observer does not work for a corner case in mobilebert (observing a scalar tensor of float32 max value)

Test Plan:
python test/test_quantization.py -k test_mul_float32_max

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
@jerryzh168 jerryzh168 changed the title [quant][pt2e][xnnpack] Allow XNNPACKQuantizer to use minmax observer for activation [quant][pt2e][xnnpack] XNNPACKQuantizer use minmax observer for input and output in some cases Nov 16, 2023
…r for input and output in some cases"

Summary:
att, this is because histogram observer does not work for a corner case in mobilebert (observing a scalar tensor of float32 max value)
because histc operator errors out when the value is larger than certain number

Test Plan:
python test/test_quantization.py -k test_mul_float32_max

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
jerryzh168 added a commit that referenced this pull request Nov 16, 2023
… and output in some cases

Summary:
att, this is because histogram observer does not work for a corner case in mobilebert (observing a scalar tensor of float32 max value)
because histc operator errors out when the value is larger than certain number

Test Plan:
python test/test_quantization.py -k test_mul_float32_max

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 94eff22
Pull Request resolved: #113405
…r for input and output in some cases"

Summary:
att, this is because histogram observer does not work for a corner case in mobilebert (observing a scalar tensor of float32 max value)
because histc operator errors out when the value is larger than certain number

Test Plan:
python test/test_quantization.py -k test_mul_float32_max

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
jerryzh168 added a commit that referenced this pull request Nov 16, 2023
… and output in some cases

Summary:
att, this is because histogram observer does not work for a corner case in mobilebert (observing a scalar tensor of float32 max value)
because histc operator errors out when the value is larger than certain number

Test Plan:
python test/test_quantization.py -k test_mul_float32_max

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: fb3a0a1
Pull Request resolved: #113405
@jerryzh168 jerryzh168 changed the title [quant][pt2e][xnnpack] XNNPACKQuantizer use minmax observer for input and output in some cases [quant][pt2e][xnnpack] XNNPACKQuantizer use FixedQParamsObserver for input and MinMaxObserver for output in some cases Nov 18, 2023
…server for input and MinMaxObserver for output in some cases"

Summary:
att, this is because histogram observer does not work for a corner case in mobilebert (observing a scalar tensor of float32 max value)
because histc operator errors out when the value is larger than certain number

Test Plan:
python test/test_quantization.py -k test_mul_float32_max

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
jerryzh168 added a commit that referenced this pull request Nov 18, 2023
…input and MinMaxObserver for output in some cases

Summary:
att, this is because histogram observer does not work for a corner case in mobilebert (observing a scalar tensor of float32 max value)
because histc operator errors out when the value is larger than certain number

Test Plan:
python test/test_quantization.py -k test_mul_float32_max

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 0cf42f4
Pull Request resolved: #113405
…server for input and MinMaxObserver for output in some cases"

Summary:
att, this is because histogram observer does not work for a corner case in mobilebert (observing a scalar tensor of float32 max value)
because histc operator errors out when the value is larger than certain number

Test Plan:
python test/test_quantization.py -k test_mul_float32_max

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
jerryzh168 added a commit that referenced this pull request Dec 1, 2023
…input and MinMaxObserver for output in some cases

Summary:
att, this is because histogram observer does not work for a corner case in mobilebert (observing a scalar tensor of float32 max value)
because histc operator errors out when the value is larger than certain number

Test Plan:
python test/test_quantization.py -k test_mul_float32_max

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: fdce6f4
Pull Request resolved: #113405
@jerryzh168 jerryzh168 changed the title [quant][pt2e][xnnpack] XNNPACKQuantizer use FixedQParamsObserver for input and MinMaxObserver for output in some cases [quant][pt2e][xnnpack] XNNPACKQuantizer skip quantization for input and output to workaround histogram observer problem Dec 1, 2023
Copy link
Contributor

@mcr229 mcr229 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure I think this looks fine

…for input and output to workaround histogram observer problem"

Summary:
att, this is because histogram observer does not work for a corner case in mobilebert (observing a scalar tensor of float32 max value)
because histc operator errors out when the value is larger than certain number

Test Plan:
python test/test_quantization.py -k test_mul_float32_max

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
jerryzh168 added a commit that referenced this pull request Dec 1, 2023
…input and MinMaxObserver for output in some cases

Summary:
att, this is because histogram observer does not work for a corner case in mobilebert (observing a scalar tensor of float32 max value)
because histc operator errors out when the value is larger than certain number

Test Plan:
python test/test_quantization.py -k test_mul_float32_max

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: c0e5216
Pull Request resolved: #113405
@jerryzh168
Copy link
Contributor Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 1, 2023
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@facebook-github-bot facebook-github-bot deleted the gh/jerryzh168/932/head branch December 5, 2023 15:28
dmenig pushed a commit to dmenig/pytorch that referenced this pull request Dec 21, 2023
…nd output to workaround histogram observer problem (pytorch#113405)

Summary:
att, this is because histogram observer does not work for a corner case in mobilebert (observing a scalar tensor of float32 max value)
because histc operator errors out when the value is larger than certain number

Test Plan:
python test/test_quantization.py -k test_mul_float32_max

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: pytorch#113405
Approved by: https://github.com/mcr229
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: quantization release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants