KEMBAR78
Fix for gcc10 torch.compile compiler error when march=aarch64+sve by robert-hardwick · Pull Request #137795 · pytorch/pytorch · GitHub
Skip to content

Conversation

@robert-hardwick
Copy link
Collaborator

@robert-hardwick robert-hardwick commented Oct 11, 2024

Disable tree vectorize in vec_convert.h for gcc10 and aarch64+sve which causes compiler error to occur.

/tmp/tmpuqk7lj9j/zx/czx2eyturb6j6m727xhvknkjbdu3y5nqqk66wgxcjkwnxuzvpm5r.cpp:3:18: internal compiler error: in vect_get_vector_types_for_stmt, at tree-vect-stmts.c:12252
    3 | extern "C"  void kernel(const float* in_ptr0,

Fixes #137775

I've not linked a gcc bug report yet as they require a minimal reproducer to be made.

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

@pytorch-bot
Copy link

pytorch-bot bot commented Oct 11, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/137795

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 5628514 with merge base 9aaf3a0 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the module: cpu CPU specific problem (e.g., perf, algorithm) label Oct 11, 2024
@robert-hardwick
Copy link
Collaborator Author

@pytorchbot label "topic: not user facing"

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Oct 11, 2024
@malfet
Copy link
Contributor

malfet commented Oct 11, 2024

@pytorchbot label "topic: not user facing"

I disagree that this is a not user facing change, it should be properly marked as bugfix

@malfet malfet added the ciflow/linux-aarch64 linux aarch64 CI workflow label Oct 11, 2024
@pytorch-bot
Copy link

pytorch-bot bot commented Oct 11, 2024

Please seek CI approval before scheduling CIFlow labels

@pytorch-bot pytorch-bot bot removed the ciflow/linux-aarch64 linux aarch64 CI workflow label Oct 11, 2024
@malfet malfet added the ciflow/linux-aarch64 linux aarch64 CI workflow label Oct 11, 2024
@robert-hardwick robert-hardwick marked this pull request as draft October 15, 2024 21:23
@robert-hardwick
Copy link
Collaborator Author

I've converted this to draft to prevent it from being merged. Whilst this fixed the problem empirically, I think a safer fix is to disable gcc's tree-vectorizor and tree-loop-vectorizer in older versions of gcc for sve supported platforms.

@robert-hardwick
Copy link
Collaborator Author

I've changed the fix to disable tree-vectorize optimizations accross the whole of vec_base.h for gcc versions from 10.0 up to 10.2 where the compiler error is known to occur.
This is a more broad and inclusive fix than I had proposed previously.

#137775 (comment)

I have propsed to target a specific version of gcc for this because when I made some quick performance comparisons there with GCC 11 with and without the optimizations for SVE there was a significant performance hit on inductor micro benchmarks.

@maajidkhann
Copy link
Contributor

I've changed the fix to disable tree-vectorize optimizations accross the whole of vec_base.h for gcc versions from 10.0 up to 10.2 where the compiler error is known to occur. This is a more broad and inclusive fix than I had proposed previously.

#137775 (comment)

I have propsed to target a specific version of gcc for this because when I made some quick performance comparisons there with GCC 11 with and without the optimizations for SVE there was a significant performance hit on inductor micro benchmarks.

I agree, optimizations for SVE is critical to get performance boost.

Summary of Key Developments w.r.t GCC for SVE:
GCC 8.x: Experimental support, minimal SVE features.
GCC 9.x: Introduced -march=armv8.2-a+sve support, better but still developing vectorization and SVE handling.
GCC 10.x: Solid stability and improvements in optimizations for SVE workloads.
GCC 11.x: Focused on enhancing performance, debug support, and SVE optimizations.
GCC 12.x and 13.x: Full maturity with best-in-class support for modern ARM cores, optimizations, and feature completeness for production use.

Summary of GCC versions in the stable Ubuntu releases:
Ubuntu 22.04 LTS: GCC 11.2 (default)
Ubuntu 22.10: GCC 12.2 (default)
Ubuntu 23.04: GCC 13.x (default)

Most of the Linux CI pipelines in PyTorch comes with Ubuntu 22.04 OS which has GCC 11.2 as default compiler and now we plan to move PyTorch CI to use GCC 12. I feel, starting from GCC 11, the SVE path of the compiler is more stable.

"I would suggest we should disable SVE build if the GCC compiler version is <11 (GCC < 11). This would avoid the bugs that we encounter mostly with gcc 10."

Since GCC is already at 13.x, I think we will still be supporting good range of GCC compiler versions for SVE.

CC @robert-hardwick @malfet

@malfet
Copy link
Contributor

malfet commented Nov 5, 2024

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased gcc_compiler_fix onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout gcc_compiler_fix && git pull --rebase)

@malfet
Copy link
Contributor

malfet commented Nov 5, 2024

@pytorchbot merge -r

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 5, 2024
@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Tried to rebase and push PR #137795, but it was already up to date. Try rebasing against main by issuing:
@pytorchbot rebase -b main

@pytorchmergebot
Copy link
Collaborator

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

@maajidkhann
Copy link
Contributor

@pytorchbot merge -r

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased gcc_compiler_fix onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout gcc_compiler_fix && git pull --rebase)

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
…torch#137795)

Disable tree vectorize in vec_convert.h for gcc10 and aarch64+sve which causes compiler error to occur.

```
/tmp/tmpuqk7lj9j/zx/czx2eyturb6j6m727xhvknkjbdu3y5nqqk66wgxcjkwnxuzvpm5r.cpp:3:18: internal compiler error: in vect_get_vector_types_for_stmt, at tree-vect-stmts.c:12252
    3 | extern "C"  void kernel(const float* in_ptr0,
```
Fixes pytorch#137775

I've not linked a gcc bug report yet as they require a minimal reproducer to be made.

Pull Request resolved: pytorch#137795
Approved by: https://github.com/malfet
pytorchmergebot pushed a commit that referenced this pull request Mar 10, 2025
Summary: autovec miscompiles on patterns of the type:
```cpp
for (const auto i : c10::irange())
```
Same issue as described in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117001 and addressed by #137795 for gcc, but not clang

Test Plan:
buck2 build //caffe2/caffe2/fb/transforms:sigrid_interface

Differential Revision: D70422723

Pull Request resolved: #148489
Approved by: https://github.com/malfet
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/linux-aarch64 linux aarch64 CI workflow ciflow/trunk Trigger trunk jobs on your pull request Merged module: cpu CPU specific problem (e.g., perf, algorithm) open source release notes: inductor topic: bug fixes topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

inductor regression on aarch64 neoverse-v1, failing unit tests due to compiler error on gcc10.2

5 participants