KEMBAR78
[pt2e] Avoid getting model device once per node by andrewor14 · Pull Request #159901 · pytorch/pytorch · GitHub
Skip to content

Conversation

@andrewor14
Copy link
Contributor

@andrewor14 andrewor14 commented Aug 5, 2025

Summary: Previously, we call assert_and_get_unqiue_device once per node in both prepare and convert. This is expensive and unnecessary since the model device is the same across all nodes, so we should just call this once in the beginning and reuse the same model device across all the nodes.

Test Plan:
python test/test_quantization.py -k TestQuantizePT2E

cc @ezyang @SherlockNoMad @EikanWang @jgong5 @wenzhe-nrv

@andrewor14 andrewor14 requested a review from jerryzh168 August 5, 2025 20:56
@pytorch-bot
Copy link

pytorch-bot bot commented Aug 5, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159901

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit c4be20c with merge base 791eff9 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot
Copy link
Contributor

@andrewor14 has imported this pull request. If you are a Meta employee, you can view this in D79674759.

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 5, 2025
"Both 'meta' and 'cpu' are present in the list of devices. Module can have one device. We Select 'cpu'."
)
devices = {torch.device("cpu")}
""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jerryzh168 is it safe to remove this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me check

@andrewor14 andrewor14 added the topic: improvements topic category label Aug 5, 2025
@andrewor14 andrewor14 force-pushed the pt2e-cache-model-device branch from 326006f to 82e8fd3 Compare August 5, 2025 21:05
@facebook-github-bot
Copy link
Contributor

@andrewor14 has imported this pull request. If you are a Meta employee, you can view this in D79674759.

@andrewor14 andrewor14 force-pushed the pt2e-cache-model-device branch from 82e8fd3 to 4eeaee7 Compare August 5, 2025 21:18
andrewor14 added a commit to pytorch/ao that referenced this pull request Aug 5, 2025
**Summary:** Previously, we call `assert_and_get_unqiue_device`
once per node in convert. This is expensive and unnecessary since
the model device is the same across all nodes, so we should just
call this once in the beginning and reuse the same model device
across all the nodes.

torchao version of pytorch/pytorch#159901

**Test Plan:**
```
python test/quantization/pt2e/test_quantize_pt2e.py
```
@facebook-github-bot
Copy link
Contributor

@andrewor14 has imported this pull request. If you are a Meta employee, you can view this in D79674759.

@andrewor14 andrewor14 force-pushed the pt2e-cache-model-device branch 2 times, most recently from 1a3b405 to f1b7d46 Compare August 5, 2025 21:39
andrewor14 added a commit to pytorch/ao that referenced this pull request Aug 5, 2025
**Summary:** Previously, we call `assert_and_get_unqiue_device`
once per node in both prepare and convert. This is expensive and
unnecessary since the model device is the same across all nodes,
so we should just call this once in the beginning and reuse the
same model device across all the nodes.

torchao version of pytorch/pytorch#159901

Note: The prepare path is not completely done yet, since we are
blocked on the pytorch PR on being merged. It's different from
convert since it still calls utility functions from
`torch.ao.quantization.fx`.

**Test Plan:**
```
python test/quantization/pt2e/test_quantize_pt2e.py
```
@facebook-github-bot
Copy link
Contributor

@andrewor14 has imported this pull request. If you are a Meta employee, you can view this in D79674759.

@andrewor14 andrewor14 force-pushed the pt2e-cache-model-device branch from f1b7d46 to 16077ad Compare August 6, 2025 00:03
@facebook-github-bot
Copy link
Contributor

@andrewor14 has imported this pull request. If you are a Meta employee, you can view this in D79674759.

@iremyux
Copy link
Collaborator

iremyux commented Aug 6, 2025

Adding ciflow/win-arm64 label to trigger Windows Arm64 CI and its test purposes - nothing about this PR specifically. (It should not effect the acceptance of the PR even if it fails.)

andrewor14 added a commit to pytorch/ao that referenced this pull request Aug 6, 2025
**Summary:** Previously, we call `assert_and_get_unqiue_device`
once per node in both prepare and convert. This is expensive and
unnecessary since the model device is the same across all nodes,
so we should just call this once in the beginning and reuse the
same model device across all the nodes.

torchao version of pytorch/pytorch#159901

Note: The prepare path is not completely done yet, since we are
blocked on the pytorch PR on being merged. It's different from
convert since it still calls utility functions from
`torch.ao.quantization.fx`.

**Test Plan:**
```
python test/quantization/pt2e/test_quantize_pt2e.py
```
@andrewor14 andrewor14 force-pushed the pt2e-cache-model-device branch from 16077ad to 27e08a5 Compare August 6, 2025 16:04
@facebook-github-bot
Copy link
Contributor

@andrewor14 has imported this pull request. If you are a Meta employee, you can view this in D79674759.

@andrewor14 andrewor14 force-pushed the pt2e-cache-model-device branch from 27e08a5 to 64aae3a Compare September 2, 2025 21:34
@facebook-github-bot
Copy link
Contributor

@andrewor14 has imported this pull request. If you are a Meta employee, you can view this in D79674759.

andrewor14 added a commit to pytorch/ao that referenced this pull request Sep 2, 2025
**Summary:** Previously, we call `assert_and_get_unqiue_device`
once per node in both prepare and convert. This is expensive and
unnecessary since the model device is the same across all nodes,
so we should just call this once in the beginning and reuse the
same model device across all the nodes.

torchao version of pytorch/pytorch#159901

Note: The prepare path is not completely done yet, since we are
blocked on the pytorch PR on being merged. It's different from
convert since it still calls utility functions from
`torch.ao.quantization.fx`.

**Test Plan:**
```
python test/quantization/pt2e/test_quantize_pt2e.py
```
model.graph,
"_scale",
scale,
scale.device,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value could be a non-Tensor I think? according to create_getattr_from_value L274, so should this be the device from model?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, seems like if we just get it from the model there will be no benefits, so how about here we get it from the scale if it's a tensor and get it from the model otherwise?

andrewor14 added a commit to pytorch/ao that referenced this pull request Sep 3, 2025
**Summary:** Previously, we call `assert_and_get_unqiue_device`
once per node in both prepare and convert. This is expensive and
unnecessary since the model device is the same across all nodes,
so we should just call this once in the beginning and reuse the
same model device across all the nodes.

torchao version of pytorch/pytorch#159901

Note: The prepare path is not completely done yet, since we are
blocked on the pytorch PR on being merged. It's different from
convert since it still calls utility functions from
`torch.ao.quantization.fx`.

**Test Plan:**
```
python test/quantization/pt2e/test_quantize_pt2e.py
```
**Summary:** Previously, we call `assert_and_get_unqiue_device`
once per node in both prepare and convert. This is expensive and
unnecessary since the model device is the same across all nodes,
so we should just call this once in the beginning and reuse the
same model device across all the nodes.

**Test Plan:**
python test/test_quantization.py -k TestQuantizePT2E

ghstack-source-id: 3e72b14
Pull Request resolved: #162012
@andrewor14 andrewor14 force-pushed the pt2e-cache-model-device branch from 64aae3a to c4be20c Compare September 3, 2025 14:45
@facebook-github-bot
Copy link
Contributor

@andrewor14 has imported this pull request. If you are a Meta employee, you can view this in D79674759.

@andrewor14
Copy link
Contributor Author

@pytorchbot merge

andrewor14 added a commit to pytorch/ao that referenced this pull request Sep 3, 2025
**Summary:** Previously, we call `assert_and_get_unqiue_device`
once per node in both prepare and convert. This is expensive and
unnecessary since the model device is the same across all nodes,
so we should just call this once in the beginning and reuse the
same model device across all the nodes.

torchao version of pytorch/pytorch#159901

Note: The prepare path is not completely done yet, since we are
blocked on the pytorch PR on being merged. It's different from
convert since it still calls utility functions from
`torch.ao.quantization.fx`.

**Test Plan:**
```
python test/quantization/pt2e/test_quantize_pt2e.py
```
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@andrewor14 andrewor14 deleted the pt2e-cache-model-device branch September 3, 2025 19:37
markc-614 pushed a commit to markc-614/pytorch that referenced this pull request Sep 17, 2025
**Summary:** Previously, we call `assert_and_get_unqiue_device` once per node in both prepare and convert. This is expensive and unnecessary since the model device is the same across all nodes, so we should just call this once in the beginning and reuse the same model device across all the nodes.

**Test Plan:**
python test/test_quantization.py -k TestQuantizePT2E

Pull Request resolved: pytorch#159901
Approved by: https://github.com/jerryzh168
mansiag05 pushed a commit to mansiag05/pytorch that referenced this pull request Sep 22, 2025
**Summary:** Previously, we call `assert_and_get_unqiue_device` once per node in both prepare and convert. This is expensive and unnecessary since the model device is the same across all nodes, so we should just call this once in the beginning and reuse the same model device across all the nodes.

**Test Plan:**
python test/test_quantization.py -k TestQuantizePT2E

Pull Request resolved: pytorch#159901
Approved by: https://github.com/jerryzh168
dsashidh pushed a commit to dsashidh/pytorch that referenced this pull request Sep 26, 2025
**Summary:** Previously, we call `assert_and_get_unqiue_device` once per node in both prepare and convert. This is expensive and unnecessary since the model device is the same across all nodes, so we should just call this once in the beginning and reuse the same model device across all the nodes.

**Test Plan:**
python test/test_quantization.py -k TestQuantizePT2E

Pull Request resolved: pytorch#159901
Approved by: https://github.com/jerryzh168
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request fx Merged release notes: AO frontend release notes: quantization release notes category topic: improvements topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants