KEMBAR78
Do not use `<filesystem>` on Linux by malfet · Pull Request #134494 · pytorch/pytorch · GitHub
Skip to content

Conversation

malfet
Copy link
Contributor

@malfet malfet commented Aug 26, 2024

Because right now it leads to symbol conflict from binary builds.
Use of std::filesystem::file_exists was introduced by #126601 and in this PR it is replaced with a very straightforward implementation that calls stat on the given path, which is a classic C-way of checking for the file existence.

This PR should be reverted once one figures out how to keep std::filesystem methods linked into the binary private

Fixes symptoms of #133437

cc @XilunWu @H-Huang @awgu @kwen2501 @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o

Because right now it leads to symbol conflict from binary builds

This PR should be reverted once one figures out how to keep `std::filesystem` methods linked into the binary private
@pytorch-bot pytorch-bot bot added oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (c10d) release notes category labels Aug 26, 2024
@pytorch-bot
Copy link

pytorch-bot bot commented Aug 26, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/134494

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 16 Cancelled Jobs

As of commit 9ffe0f8 with merge base 0a3c064 (image):

NEW FAILURE - The following job has failed:

CANCELLED JOBS - The following jobs were cancelled. Please retry:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@malfet malfet requested a review from d4l3k August 26, 2024 20:05
@malfet malfet added topic: not user facing topic category ciflow/trunk Trigger trunk jobs on your pull request labels Aug 26, 2024
Copy link
Contributor

@atalman atalman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Member

@d4l3k d4l3k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@malfet
Copy link
Contributor Author

malfet commented Aug 27, 2024

There is one more use case, in aoti_runner...

@atalman atalman added the ciflow/binaries_wheel Trigger binary build and upload jobs for wheel on the PR label Aug 27, 2024
@atalman
Copy link
Contributor

atalman commented Aug 27, 2024

confirmed this is resolved. By installing pip install torch-2.5.0.dev20240827+cpu-cp311-cp311-linux_x86_64.whl
Then running:

nm -D libtorch_cpu.so | grep "recursive_directory_iterator"

@atalman
Copy link
Contributor

atalman commented Aug 27, 2024

@pytorchmergebot merge -f "Builds jobs are passing"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@malfet
Copy link
Contributor Author

malfet commented Aug 27, 2024

@atalman thank you for the merge, but I've tried downloading other builds and still seeing some symbols there....

@atalman
Copy link
Contributor

atalman commented Aug 27, 2024

@malfet confirmed - it works with cuda binary as well. torch-2.5.0.dev20240827+cu121-cp311-cp311-linux_x86_64.whl

@atalman
Copy link
Contributor

atalman commented Aug 27, 2024

@pytorchbot cherry-pick --onto release/2.4 -c critical

@pytorchbot
Copy link
Collaborator

Cherry picking #134494

Command git -C /home/runner/work/pytorch/pytorch cherry-pick -x 0fa0ac80e4cb2aa7eba5c5753be6c428b5eee45e returned non-zero exit code 1

Auto-merging torch/csrc/distributed/c10d/control_plane/WorkerServer.cpp
CONFLICT (content): Merge conflict in torch/csrc/distributed/c10d/control_plane/WorkerServer.cpp
Auto-merging torch/csrc/inductor/aoti_runner/model_container_runner.cpp
CONFLICT (content): Merge conflict in torch/csrc/inductor/aoti_runner/model_container_runner.cpp
error: could not apply 0fa0ac80e4... Do not use `<filesystem>` on Linux (#134494)
hint: After resolving the conflicts, mark them with
hint: "git add/rm <pathspec>", then run
hint: "git cherry-pick --continue".
hint: You can instead skip this commit with "git cherry-pick --skip".
hint: To abort and get back to the state before "git cherry-pick",
hint: run "git cherry-pick --abort".
hint: Disable this message with "git config advice.mergeConflict false"
Details for Dev Infra team Raised by workflow job

atalman pushed a commit to atalman/pytorch that referenced this pull request Aug 27, 2024
Because right now it leads to symbol conflict from binary builds.
Use of `std::filesystem::file_exists` was introduced by pytorch#126601 and in this PR it is replaced with a very straightforward implementation that calls `stat` on the given path, which is a classic C-way of checking for the file existence.

This PR should be reverted once one figures out how to keep `std::filesystem` methods linked into the binary private

Fixes symptoms of pytorch#133437

Pull Request resolved: pytorch#134494
Approved by: https://github.com/atalman, https://github.com/d4l3k
atalman added a commit that referenced this pull request Aug 27, 2024
Because right now it leads to symbol conflict from binary builds.
Use of `std::filesystem::file_exists` was introduced by #126601 and in this PR it is replaced with a very straightforward implementation that calls `stat` on the given path, which is a classic C-way of checking for the file existence.

This PR should be reverted once one figures out how to keep `std::filesystem` methods linked into the binary private

Fixes symptoms of #133437

Pull Request resolved: #134494
Approved by: https://github.com/atalman, https://github.com/d4l3k

Co-authored-by: Nikita Shulga <nshulga@meta.com>
Chao1Han pushed a commit to Chao1Han/pytorch that referenced this pull request Sep 20, 2024
Because right now it leads to symbol conflict from binary builds.
Use of `std::filesystem::file_exists` was introduced by pytorch#126601 and in this PR it is replaced with a very straightforward implementation that calls `stat` on the given path, which is a classic C-way of checking for the file existence.

This PR should be reverted once one figures out how to keep `std::filesystem` methods linked into the binary private

Fixes symptoms of pytorch#133437

Pull Request resolved: pytorch#134494
Approved by: https://github.com/atalman, https://github.com/d4l3k
pytorchmergebot pushed a commit that referenced this pull request Oct 3, 2024
Similar to: #134494
We are seeing come back of #133437 due to use of filesystem on Linux

Pull Request resolved: #137209
Approved by: https://github.com/kit1980, https://github.com/malfet
kit1980 added a commit that referenced this pull request Oct 3, 2024
* Apply changes from #135374

* Fix dependency on filesystem on Linux (#137209)

Similar to: #134494
We are seeing come back of #133437 due to use of filesystem on Linux

Pull Request resolved: #137209
Approved by: https://github.com/kit1980, https://github.com/malfet

---------

Co-authored-by: atalman <atalman@fb.com>
@github-actions github-actions bot deleted the malfet/do-not-use-std-filesystem branch October 25, 2024 02:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/binaries_wheel Trigger binary build and upload jobs for wheel on the PR ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request Merged oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (c10d) release notes category topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants