KEMBAR78
Add XPU compiler version control in cmake to keep BC by guangyey · Pull Request #139258 · pytorch/pytorch · GitHub
Skip to content

Conversation

@guangyey
Copy link
Collaborator

@guangyey guangyey commented Oct 30, 2024

Stack from ghstack (oldest at bottom):

Motivation

This PR aims to maintain backward compatibility when building PyTorch XPU with the old and new compilers.

Additional Context

The details are described here. The new compiler (2025.0.0) has some breaking changes compared with the old compiler(2024.1), for examples:

  1. On Windows, sycl library is named sycl7.lib in the old compiler but is named sycl.lib in the new compiler.
  2. On Linux, in order to support ABI=0, we have to link libsycl-preview.so in the old compiler but we could link libsycl.so in the new compiler to have the same ABI compatibility.
  3. We added a macro SYCL_COMPILER_VERSION to support our new code has good backward compatibility with the old compiler. Now the new feature(Event elapsed_time, memory summary, and device architecture property) introduced by the new compiler will be controlled within the macro SYCL_COMPILER_VERSION.

cc @gujinghui @PenghuiCheng @XiaobingSuper @jianyuh @jgong5 @mingfeima @sanchitintel @ashokei @jingxu10 @min-jean-cho @yanbing-j @Guobing-Chen @Xia-Weiwen @snadampal @EikanWang @fengyuan14

@pytorch-bot
Copy link

pytorch-bot bot commented Oct 30, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/139258

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 5a3e292 with merge base dfcf740 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@guangyey guangyey changed the title Add SYCL version control in cmake to keep BC [WIP] Add SYCL version control in cmake to keep BC Oct 30, 2024
@guangyey guangyey marked this pull request as draft October 30, 2024 04:07
@guangyey guangyey requested a review from EikanWang October 30, 2024 04:12
@guangyey guangyey added release notes: xpu release notes category module: xpu Intel XPU related issues labels Oct 30, 2024
@guangyey guangyey requested a review from gujinghui October 30, 2024 04:13
list(APPEND SYCL_INCLUDE_DIR ${SYCL_INCLUDE_SYCL_DIR})

# Find include/sycl/version.hpp to fetch sycl compiler version
if(EXISTS ${SYCL_INCLUDE_SYCL_DIR}/version.hpp)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to use find_file, to align with above lines, for example line #34

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

# 3. Strip leading and trailing spaces to get the version number.
string(REGEX MATCH "__SYCL_COMPILER_VERSION[ ]+[0-9]+" TEMP1 ${SYCL_VERSION_HEADER_CONTENT})
string(REPLACE "__SYCL_COMPILER_VERSION" "" TEMP2 ${TEMP1})
string(STRIP ${TEMP2} SYCL_COMPILER_VERSION)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we give reasonable name to TEMP* vars?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@guangyey guangyey added the ciflow/xpu Run XPU CI tasks label Oct 30, 2024
guangyey added a commit that referenced this pull request Oct 30, 2024
ghstack-source-id: 475f712
Pull Request resolved: #139258
@guangyey guangyey requested a review from gujinghui October 30, 2024 05:38
@gujinghui
Copy link
Collaborator

@guangyey
BTW, should we record the IGC version, as well? As what we did in IPEX?

@guangyey
Copy link
Collaborator Author

@guangyey BTW, should we record the IGC version, as well? As what we did in IPEX?

Currently, I'm not sure what we can do using the IGC version?

@gujinghui
Copy link
Collaborator

@guangyey BTW, should we record the IGC version, as well? As what we did in IPEX?

Currently, I'm not sure what we can do using the IGC version?

Record for debugging as part of build log at least?

@pytorch-bot pytorch-bot bot added ciflow/linux-aarch64 linux aarch64 CI workflow module: mkldnn Related to Intel IDEEP or oneDNN (a.k.a. mkldnn) integration labels Oct 30, 2024
[ghstack-poisoned]
[ghstack-poisoned]
@guangyey guangyey requested a review from ezyang November 6, 2024 01:21
@atalman atalman added the ciflow/binaries_wheel Trigger binary build and upload jobs for wheel on the PR label Nov 6, 2024
@atalman
Copy link
Contributor

atalman commented Nov 6, 2024

@guangyey this looks ok. I see it modifies both Windows and Linux builds. Could you please include some testing details ? Like extract from the build log to show that this logic is performing correctly.

Please rebase the PR to fix issue with aarch64 builds

@guangyey
Copy link
Collaborator Author

guangyey commented Nov 7, 2024

Thanks, @atalman I added and supplemented a UT in https://github.com/pytorch/pytorch/pull/139466/files#diff-90dbc2d4fedeae4036c450fc822bb422e97b4a8e576138d6811827a0c8ceeb13R433-R449 to validate the build flow as we expected.
I will rebase this PR to fix the aarch64 build issue.

@guangyey
Copy link
Collaborator Author

guangyey commented Nov 7, 2024

@atalman It looks like the aarch64 build failures are unrelated to this PR, please see #139971

[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
Copy link
Contributor

@atalman atalman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

[ghstack-poisoned]
Comment on lines +99 to +101
foreach(flag ${XPU_HOST_CXX_FLAGS})
add_definitions(${flag})
endforeach()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an anti-pattern, one should not add global flags to all targets, only to the ones that needs them

Copy link
Collaborator Author

@guangyey guangyey Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate your feedback and completely agree with your point.
Currently, we face a technical challenge with __INTEL_PREVIEW_BREAKING_CHANGES macro in <sycl/sycl.hpp>, which is critical for maintaining ABI backward compatibility. So we have to propagate this macro to all libraries that will probably use it, such as c10_xpu, libtorch_cpu, libtorch_xpu, libtorch_python, and even third party libraries including oneDNN, torch-xpu-ops, and kineto. I will investigate this issue, and refine it once I identify a more elegant method.

@guangyey
Copy link
Collaborator Author

guangyey commented Nov 8, 2024

The XPU CI failure depends on #140095

[ghstack-poisoned]
pytorchmergebot pushed a commit that referenced this pull request Nov 9, 2024
# Motivation
We add a new attribute `torch.version.xpu` to facilitate the problem diagnosing and version control.

# Additional Context
It is aligned with `torch.version.cuda` and `torch.version.hip`.

Pull Request resolved: #139466
Approved by: https://github.com/EikanWang, https://github.com/ezyang, https://github.com/atalman, https://github.com/malfet
ghstack dependencies: #139258
pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
# Motivation
This PR aims to maintain backward compatibility when building PyTorch XPU with the old and new compilers.

# Additional Context
The details are described here. The new compiler (2025.0.0) has some breaking changes compared with the old compiler(2024.1), for examples:
1. On Windows, sycl library is named `sycl7.lib` in the old compiler but is named `sycl.lib` in the new compiler.
2. On Linux, in order to support ABI=0, we have to link `libsycl-preview.so` in the old compiler but we could link `libsycl.so` in the new compiler to have the same ABI compatibility.
3. We added a macro `SYCL_COMPILER_VERSION` to support our new code has good backward compatibility with the old compiler. Now the new feature(Event elapsed_time, memory summary, and device architecture property) introduced by the new compiler will be controlled within the macro `SYCL_COMPILER_VERSION`.

Pull Request resolved: pytorch#139258
Approved by: https://github.com/EikanWang, https://github.com/atalman, https://github.com/gujinghui
pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
# Motivation
We add a new attribute `torch.version.xpu` to facilitate the problem diagnosing and version control.

# Additional Context
It is aligned with `torch.version.cuda` and `torch.version.hip`.

Pull Request resolved: pytorch#139466
Approved by: https://github.com/EikanWang, https://github.com/ezyang, https://github.com/atalman, https://github.com/malfet
ghstack dependencies: pytorch#139258
@github-actions github-actions bot deleted the gh/guangyey/85/head branch December 11, 2024 02:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/binaries_wheel Trigger binary build and upload jobs for wheel on the PR ciflow/linux-aarch64 linux aarch64 CI workflow ciflow/xpu Run XPU CI tasks Merged module: mkldnn Related to Intel IDEEP or oneDNN (a.k.a. mkldnn) integration module: xpu Intel XPU related issues open source release notes: xpu release notes category

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

8 participants