[quant][pt2e] Support int16 quantization #108453

jerryzh168 · 2023-09-01T22:25:04Z

Stack from ghstack (oldest at bottom):

-> [quant][pt2e] Support int16 quantization #108453

Summary:
Previously we can only use native pytorch int dtypes that has corresponding quantized dtypes (e.g. quint8, qint8), this
PR removes this assumption in observers/fake_quants so that users can use all pytorch native dtypes (except for int64, we can add it later if need)
the main addition here is int16.

Test Plan:
python test/test_quantization.py TestQuantizePT2E

Reviewers:

Subscribers:

Tasks:

Tags:

Summary: Previously we can only use native pytorch int dtypes that has corresponding quantized dtypes (e.g. quint8, qint8), this PR removes this assumption in observers/fake_quants so that users can use all pytorch native dtypes (except for int64, we can add it later if need) the main addition here is int16. Test Plan: python test/test_quantization.py TestQuantizePT2E Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

pytorch-bot · 2023-09-01T22:25:06Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/108453

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit e5c0134 with merge base 66af4f6 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Summary: Previously we can only use native pytorch int dtypes that has corresponding quantized dtypes (e.g. quint8, qint8), this PR removes this assumption in observers/fake_quants so that users can use all pytorch native dtypes (except for int64, we can add it later if need) the main addition here is int16. Test Plan: python test/test_quantization.py TestQuantizePT2E Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: Previously we can only use native pytorch int dtypes that has corresponding quantized dtypes (e.g. quint8, qint8), this PR removes this assumption in observers/fake_quants so that users can use all pytorch native dtypes (except for int64, we can add it later if need) the main addition here is int16. Test Plan: python test/test_quantization.py TestQuantizePT2E Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 347f756 Pull Request resolved: #108453

kimishpatel · 2023-09-05T14:56:21Z

torch/ao/quantization/observer.py

            scale = max_val_pos / (float(quant_max - quant_min) / 2)
            scale = torch.max(scale, self.eps)
-            if self.dtype == torch.quint8:
+            if self.dtype in [torch.quint8, torch.uint8]:


why did this have to be changed?

this is to make sure the code works for both quint8 and uint8 dtypes

kimishpatel

sending back for test fix.

For dtype change in fx workflow, previously it was working because we were mapping torch dtypes to quantized dtypes?

kimishpatel · 2023-09-05T15:10:48Z

test/quantization/pt2e/test_quantize_pt2e.py

+            def annotate(self, model: torch.fx.GraphModule) -> torch.fx.GraphModule:
+                # using int32 to simulate int16
+                int16_qspec = QuantizationSpec(
+                    dtype=torch.int32,


this says dtype=int32 not int16

oh nevermind, I see the restriction on the values. I thought int16 is supported torch dtype but not uint16? Then why not use that dtype directly

oh I think I can use torch.int16 here directly, not sure why I used int32 in the beginning

test/quantization/pt2e/test_quantize_pt2e.py

Summary: Previously we can only use native pytorch int dtypes that has corresponding quantized dtypes (e.g. quint8, qint8), this PR removes this assumption in observers/fake_quants so that users can use all pytorch native dtypes (except for int64, we can add it later if need) the main addition here is int16. Test Plan: python test/test_quantization.py TestQuantizePT2E Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: Previously we can only use native pytorch int dtypes that has corresponding quantized dtypes (e.g. quint8, qint8), this PR removes this assumption in observers/fake_quants so that users can use all pytorch native dtypes (except for int64, we can add it later if need) the main addition here is int16. Test Plan: python test/test_quantization.py TestQuantizePT2E Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: aa88d50 Pull Request resolved: #108453

Summary: Previously we can only use native pytorch int dtypes that has corresponding quantized dtypes (e.g. quint8, qint8), this PR removes this assumption in observers/fake_quants so that users can use all pytorch native dtypes (except for int64, we can add it later if need) the main addition here is int16. Test Plan: python test/test_quantization.py TestQuantizePT2E Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: Previously we can only use native pytorch int dtypes that has corresponding quantized dtypes (e.g. quint8, qint8), this PR removes this assumption in observers/fake_quants so that users can use all pytorch native dtypes (except for int64, we can add it later if need) the main addition here is int16. Test Plan: python test/test_quantization.py TestQuantizePT2E Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 9191a2c Pull Request resolved: #108453

jerryzh168 · 2023-09-05T22:07:14Z

For dtype change in fx workflow, previously it was working because we were mapping torch dtypes to quantized dtypes?

yeah that's correct, we are not really using torch dtypes before in observer or fx flows

Summary: Previously we can only use native pytorch int dtypes that has corresponding quantized dtypes (e.g. quint8, qint8), this PR removes this assumption in observers/fake_quants so that users can use all pytorch native dtypes (except for int64, we can add it later if need) the main addition here is int16. Test Plan: python test/test_quantization.py TestQuantizePT2E Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: Previously we can only use native pytorch int dtypes that has corresponding quantized dtypes (e.g. quint8, qint8), this PR removes this assumption in observers/fake_quants so that users can use all pytorch native dtypes (except for int64, we can add it later if need) the main addition here is int16. Test Plan: python test/test_quantization.py TestQuantizePT2E Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 469244a Pull Request resolved: #108453

jerryzh168 · 2023-09-06T16:50:37Z

@pytorchbot merge

pytorchmergebot · 2023-09-06T16:53:43Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorch-bot bot added release notes: quantization release notes category labels Sep 1, 2023

jerryzh168 requested a review from kimishpatel September 1, 2023 22:25

jerryzh168 requested a review from andrewor14 September 1, 2023 22:25

kimishpatel reviewed Sep 5, 2023

View reviewed changes

kimishpatel requested changes Sep 5, 2023

View reviewed changes

andrewor14 reviewed Sep 5, 2023

View reviewed changes

test/quantization/pt2e/test_quantize_pt2e.py Outdated Show resolved Hide resolved

jerryzh168 requested a review from kimishpatel September 5, 2023 22:05

kimishpatel approved these changes Sep 5, 2023

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 6, 2023

pytorchmergebot added the merging label Sep 6, 2023

pytorchmergebot added Merged and removed merging labels Sep 6, 2023

pytorchmergebot closed this in 32a16d4 Sep 6, 2023

facebook-github-bot deleted the gh/jerryzh168/910/head branch September 10, 2023 14:22

Juelianqvq mentioned this pull request Dec 27, 2023

Does tinynn support following int16 quantization? alibaba/TinyNeuralNetwork#275

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[quant][pt2e] Support int16 quantization #108453

[quant][pt2e] Support int16 quantization #108453

Uh oh!

jerryzh168 commented Sep 1, 2023 •

edited

Loading

Uh oh!

pytorch-bot bot commented Sep 1, 2023 •

edited

Loading

Uh oh!

kimishpatel Sep 5, 2023

Uh oh!

jerryzh168 Sep 5, 2023

Uh oh!

kimishpatel left a comment

Uh oh!

kimishpatel Sep 5, 2023

Uh oh!

kimishpatel Sep 5, 2023

Uh oh!

jerryzh168 Sep 5, 2023

Uh oh!

Uh oh!

jerryzh168 commented Sep 5, 2023

Uh oh!

jerryzh168 commented Sep 6, 2023

Uh oh!

pytorchmergebot commented Sep 6, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[quant][pt2e] Support int16 quantization #108453

[quant][pt2e] Support int16 quantization #108453

Uh oh!

Conversation

jerryzh168 commented Sep 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/108453

✅ No Failures

Uh oh!

kimishpatel Sep 5, 2023

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Sep 5, 2023

Choose a reason for hiding this comment

Uh oh!

kimishpatel left a comment

Choose a reason for hiding this comment

Uh oh!

kimishpatel Sep 5, 2023

Choose a reason for hiding this comment

Uh oh!

kimishpatel Sep 5, 2023

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Sep 5, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jerryzh168 commented Sep 5, 2023

Uh oh!

jerryzh168 commented Sep 6, 2023

Uh oh!

pytorchmergebot commented Sep 6, 2023

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jerryzh168 commented Sep 1, 2023 •

edited

Loading

pytorch-bot bot commented Sep 1, 2023 •

edited

Loading