KEMBAR78
Multiprocessing support for NT by jbschlosser · Pull Request #110292 · pytorch/pytorch · GitHub
Skip to content

Conversation

@jbschlosser
Copy link
Contributor

@jbschlosser jbschlosser commented Sep 29, 2023

Stack from ghstack (oldest at bottom):

Fixes #110161

Allows NTs to be used in DataLoaders with num_workers > 1.

@pytorch-bot pytorch-bot bot added the release notes: dataloader release notes category label Sep 29, 2023
@pytorch-bot
Copy link

pytorch-bot bot commented Sep 29, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/110292

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit a514685 with merge base 46a5558 (image):

UNSTABLE - The following jobs failed but were likely due to flakiness present on trunk and has been marked as unstable:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jbschlosser added a commit that referenced this pull request Sep 29, 2023
ghstack-source-id: 710581d
Pull Request resolved: #110292
@jbschlosser jbschlosser added topic: improvements topic category release notes: nested tensor Changes that have a direct impact on nested tensors and removed release notes: dataloader release notes category labels Sep 29, 2023
@jbschlosser
Copy link
Contributor Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 29, 2023
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / macos-12-py3-arm64 / test (default, 1, 3, macos-m1-12)

Details for Dev Infra team Raised by workflow job

@cpuhrsch
Copy link
Contributor

Failures look real unfortunately @jbschlosser

Fixes #110161

Allows NTs to be used in DataLoaders with `num_workers > 1`.

[ghstack-poisoned]
@jbschlosser
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorch-bot
Copy link

pytorch-bot bot commented Oct 2, 2023

❌ 🤖 pytorchbot command failed:

@pytorchbot revert: error: the following arguments are required: -m/--message, -c/--classification

usage: @pytorchbot revert -m MESSAGE -c
                          {nosignal,ignoredsignal,landrace,weird,ghfirst}

Try @pytorchbot --help for more info.

@pytorch-bot
Copy link

pytorch-bot bot commented Oct 2, 2023

❌ 🤖 pytorchbot command failed:

@pytorchbot revert: error: the following arguments are required: -c/--classification

usage: @pytorchbot revert -m MESSAGE -c
                          {nosignal,ignoredsignal,landrace,weird,ghfirst}

Try @pytorchbot --help for more info.

@jbschlosser
Copy link
Contributor Author

@pytorchbot revert -m "Address review comments" -c "weird"

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

Fixes #110161

Allows NTs to be used in DataLoaders with `num_workers > 1`.

[ghstack-poisoned]
@jbschlosser
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / macos-12-py3-arm64 / build

Details for Dev Infra team Raised by workflow job

@jbschlosser
Copy link
Contributor Author

@pytorchbot merge -f "ignore spurious failure"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@kit1980
Copy link
Contributor

kit1980 commented Oct 6, 2023

@pytorchbot revert -m "Causes CUDA memory leaks" -c nosignal

RuntimeError: CUDA driver API confirmed a leak in main.TestDataLoaderDeviceTypeCUDA.test_nested_tensor_multiprocessing_context_forkserver_cuda! Caching allocator allocated memory was 5120 and is now reported as 10240 on device 0. CUDA driver allocated memory was 340459520 and is now 342556672.

https://github.com/pytorch/pytorch/actions/runs/6425541384/job/17449001020

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

@pytorchmergebot
Copy link
Collaborator

@jbschlosser your PR has been successfully reverted.

pytorchmergebot added a commit that referenced this pull request Oct 6, 2023
This reverts commit f17fe89.

Reverted #110292 on behalf of https://github.com/kit1980 due to Causes CUDA memory leaks ([comment](#110292 (comment)))
@jbschlosser jbschlosser reopened this Oct 6, 2023
Fixes #110161

Allows NTs to be used in DataLoaders with `num_workers > 1`.

[ghstack-poisoned]
@jbschlosser
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorchmergebot pushed a commit that referenced this pull request Oct 10, 2023
@facebook-github-bot facebook-github-bot deleted the gh/jbschlosser/91/head branch October 14, 2023 14:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: nested tensor Changes that have a direct impact on nested tensors Reverted topic: improvements topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants