KEMBAR78
Fix max_width computation in _tensor_str._Formatter by raphaelreme · Pull Request #126859 · pytorch/pytorch · GitHub
Skip to content

Conversation

@raphaelreme
Copy link
Contributor

@raphaelreme raphaelreme commented May 22, 2024

Previous version of torch._tensor_str._Formatter was not using PRINT_OPTS.sci_mode for the max_width computation but was using it for the formatting of values leading to a weird discrepancy.

Now, the code first checks if it should be in sci_mode, then compute max_width

Here is an example to test the behavior:

A = torch.tensor([10, 1e-1, 1e-2])
B = torch.tensor([10, 1e-1, 1e-1])

print("================= Default =================")
print(A, f"Formatter max_width: {torch._tensor_str._Formatter(A).max_width}")
print(B, f"Formatter max_width: {torch._tensor_str._Formatter(B).max_width}")
    
print("================= sci_mode=False =================")
with torch._tensor_str.printoptions(sci_mode=False):
    print(A, f"Formatter max_width: {torch._tensor_str._Formatter(A).max_width}")
    print(B, f"Formatter max_width: {torch._tensor_str._Formatter(B).max_width}")

print("================= sci_mode=True =================")
with torch._tensor_str.printoptions(sci_mode=True):
    print(A, f"Formatter max_width: {torch._tensor_str._Formatter(A).max_width}")
    print(B, f"Formatter max_width: {torch._tensor_str._Formatter(B).max_width}")

In the current version this prints:

================= Default =================
tensor([1.0000e+01, 1.0000e-01, 1.0000e-02]) Formatter max_width: 10
tensor([10.0000,  0.1000,  0.1000]) Formatter max_width: 7
================= sci_mode=False =================
tensor([   10.0000,     0.1000,     0.0100]) Formatter max_width: 10
tensor([10.0000,  0.1000,  0.1000]) Formatter max_width: 7
================= sci_mode=True =================
tensor([1.0000e+01, 1.0000e-01, 1.0000e-02]) Formatter max_width: 10
tensor([1.0000e+01, 1.0000e-01, 1.0000e-01]) Formatter max_width: 7

On can see that in sci_mode=False, the values of A are prefixed with unneeded 0 and does not have the same max_width as B (It keeps the max_width from sci_mode = None)

Also in sci_mode = True, for B, the max_width is 7 but each value takes 10 chars... (But it is fine as the code that uses max_width do not rely much on it, but still, this is missleading)

After this commit, this will print

================= Default =================
tensor([1.0000e+01, 1.0000e-01, 1.0000e-02]) Formatter max_width: 10
tensor([10.0000,  0.1000,  0.1000]) Formatter max_width: 7
================= sci_mode=False =================
tensor([10.0000,  0.1000,  0.0100]) Formatter max_width: 7
tensor([10.0000,  0.1000,  0.1000]) Formatter max_width: 7
================= sci_mode=True =================
tensor([1.0000e+01, 1.0000e-01, 1.0000e-02]) Formatter max_width: 10
tensor([1.0000e+01, 1.0000e-01, 1.0000e-01]) Formatter max_width: 10

This also allows to align A with B for sci_mode=False.

@pytorch-bot
Copy link

pytorch-bot bot commented May 22, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/126859

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 2 Unrelated Failures

As of commit 860b118 with merge base 62a49d9 (image):

NEW FAILURE - The following job has failed:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@colesbury
Copy link
Member

@huydhn - would you please review this PR?

@colesbury colesbury added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label May 24, 2024
@raphaelreme
Copy link
Contributor Author

Hi, any updates on this ?

@huydhn
Copy link
Contributor

huydhn commented Jul 23, 2024

Although the change makes sense to me, I think someone from core should be in a better position to have a final say about this as this change the tensor print output. I guess I can cc @malfet for a review.

@malfet Plz help forward this to the correct code owners for their review if you have one in mind.

@huydhn huydhn requested review from malfet and removed request for huydhn July 23, 2024 08:55
@github-actions
Copy link
Contributor

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions github-actions bot added the Stale label Sep 21, 2024
@raphaelreme
Copy link
Contributor Author

Hi ! I saw the PR was automatically switch to stale. Is there any update on this ?

@github-actions github-actions bot closed this Oct 26, 2024
@DerekGloudemans
Copy link

Bumping, can someone please review this pull request?

@klevzoff
Copy link

Would love to see this fix reviewed/merged!

@raphaelreme
Copy link
Contributor Author

Same here, any news for this PR? @malfet

@malfet malfet reopened this Jul 24, 2025
@malfet malfet added release notes: python_frontend python frontend release notes category topic: bug fixes topic category labels Jul 24, 2025
malfet
malfet previously approved these changes Jul 24, 2025
@malfet
Copy link
Contributor

malfet commented Jul 24, 2025

@raphaelreme can you add some sort of a regression test to test_torch.py?

@malfet
Copy link
Contributor

malfet commented Jul 24, 2025

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased rreme/fix_tensor_formatter onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout rreme/fix_tensor_formatter && git pull --rebase)

@pytorchmergebot pytorchmergebot force-pushed the rreme/fix_tensor_formatter branch from 3dc2d5e to 5a6e2fb Compare July 24, 2025 15:48
@raphaelreme
Copy link
Contributor Author

@malfet I simply updated the related test for sci_mode=False which was testing exactly for this weird behavior with leading 0s. Now the test correctly expects the formatter to use the minimal max_width even in sci_mode=False.

Other tests are already written and should be unaffected by these changes.

@malfet
Copy link
Contributor

malfet commented Jul 26, 2025

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

Previous version was not using PRINT_OPTS.sci_mode for the max_width computation
but was using it for display leading to a weird discrepancy.

Now, the code first checks if the display should be in sci_mode, then compute max_width
@malfet
Copy link
Contributor

malfet commented Jul 30, 2025

@pytorchbot merge -f "Signal is green this time around"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@yangw-dev
Copy link
Contributor

@pytorchbot revert -m "broke trunk with test distributed/test_c10d_functional_native.py::CompileTest::test_inductor_all_reduce_single - RuntimeError: Expected to find buf7 = empty but did not find it" -c nosignal

see
https://github.com/pytorch/pytorch/actions/runs/16624723924/job/47038780720

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

pytorchmergebot added a commit that referenced this pull request Jul 30, 2025
This reverts commit 1465757.

Reverted #126859 on behalf of https://github.com/yangw-dev due to broke trunk with test  distributed/test_c10d_functional_native.py::CompileTest::test_inductor_all_reduce_single - RuntimeError: Expected to find buf7 = empty but did not find it ([comment](#126859 (comment)))
@pytorchmergebot
Copy link
Collaborator

@raphaelreme your PR has been successfully reverted.

@pytorchmergebot pytorchmergebot added Reverted ci-no-td Do not run TD on this PR labels Jul 30, 2025
@pytorch-bot pytorch-bot bot dismissed malfet’s stale review July 30, 2025 16:56

This PR was reopened (likely due to being reverted), so your approval was removed. Please request another review.

@malfet
Copy link
Contributor

malfet commented Jul 30, 2025

@pytorchbot merge -f "Was reverted by mistake, actual root cause of failure is #147754 "

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: PR #126859 has not been reviewed yet

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

yangw-dev pushed a commit that referenced this pull request Aug 1, 2025
Previous version of `torch._tensor_str._Formatter` was not using `PRINT_OPTS.sci_mode` for the `max_width` computation but was using it for the formatting of values leading to a weird discrepancy.

Now, the code first checks if it should be in sci_mode, then compute `max_width`

Here is an example to test the behavior:
```python
A = torch.tensor([10, 1e-1, 1e-2])
B = torch.tensor([10, 1e-1, 1e-1])

print("================= Default =================")
print(A, f"Formatter max_width: {torch._tensor_str._Formatter(A).max_width}")
print(B, f"Formatter max_width: {torch._tensor_str._Formatter(B).max_width}")

print("================= sci_mode=False =================")
with torch._tensor_str.printoptions(sci_mode=False):
    print(A, f"Formatter max_width: {torch._tensor_str._Formatter(A).max_width}")
    print(B, f"Formatter max_width: {torch._tensor_str._Formatter(B).max_width}")

print("================= sci_mode=True =================")
with torch._tensor_str.printoptions(sci_mode=True):
    print(A, f"Formatter max_width: {torch._tensor_str._Formatter(A).max_width}")
    print(B, f"Formatter max_width: {torch._tensor_str._Formatter(B).max_width}")
```

In the current version this prints:
```
================= Default =================
tensor([1.0000e+01, 1.0000e-01, 1.0000e-02]) Formatter max_width: 10
tensor([10.0000,  0.1000,  0.1000]) Formatter max_width: 7
================= sci_mode=False =================
tensor([   10.0000,     0.1000,     0.0100]) Formatter max_width: 10
tensor([10.0000,  0.1000,  0.1000]) Formatter max_width: 7
================= sci_mode=True =================
tensor([1.0000e+01, 1.0000e-01, 1.0000e-02]) Formatter max_width: 10
tensor([1.0000e+01, 1.0000e-01, 1.0000e-01]) Formatter max_width: 7
```

On can see that in `sci_mode=False`, the values of A are prefixed with unneeded 0 and does not have the same `max_width` as B (It keeps the `max_width` from `sci_mode = None`)

Also in `sci_mode = True`, for B, the `max_width` is 7 but each value takes 10 chars... (But it is fine as the code that uses `max_width` do not rely much on it, but still, this is missleading)

After this commit, this will print
```
================= Default =================
tensor([1.0000e+01, 1.0000e-01, 1.0000e-02]) Formatter max_width: 10
tensor([10.0000,  0.1000,  0.1000]) Formatter max_width: 7
================= sci_mode=False =================
tensor([10.0000,  0.1000,  0.0100]) Formatter max_width: 7
tensor([10.0000,  0.1000,  0.1000]) Formatter max_width: 7
================= sci_mode=True =================
tensor([1.0000e+01, 1.0000e-01, 1.0000e-02]) Formatter max_width: 10
tensor([1.0000e+01, 1.0000e-01, 1.0000e-01]) Formatter max_width: 10
```

This also allows to align A with B for `sci_mode=False`.
Pull Request resolved: #126859
Approved by: https://github.com/malfet
yangw-dev pushed a commit that referenced this pull request Aug 1, 2025
This reverts commit 1465757.

Reverted #126859 on behalf of https://github.com/yangw-dev due to broke trunk with test  distributed/test_c10d_functional_native.py::CompileTest::test_inductor_all_reduce_single - RuntimeError: Expected to find buf7 = empty but did not find it ([comment](#126859 (comment)))
@raphaelreme
Copy link
Contributor Author

@malfet I think you have to approve again the PR before merging!

@malfet
Copy link
Contributor

malfet commented Aug 1, 2025

@pytorchbot merge -f "We are at it again"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-no-td Do not run TD on this PR Merged open source release notes: python_frontend python frontend release notes category Reverted Stale topic: bug fixes topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants