KEMBAR78
Add `torch.utils.deterministic.fill_uninitialized_memory` flag by kurtamohler · Pull Request #111377 · pytorch/pytorch · GitHub
Skip to content

Conversation

@kurtamohler
Copy link
Collaborator

@kurtamohler kurtamohler commented Oct 16, 2023

Part of #109802

cc @mruberry

@kurtamohler kurtamohler added module: determinism release notes: python_frontend python frontend release notes category labels Oct 16, 2023
@kurtamohler kurtamohler requested a review from albanD October 16, 2023 17:13
@kurtamohler kurtamohler requested review from a team and kulinseth as code owners October 16, 2023 17:13
@pytorch-bot
Copy link

pytorch-bot bot commented Oct 16, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/111377

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (3 Unrelated Failures)

As of commit a0731c0 with merge base 7fe51e3 (image):

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the ciflow/mps Run MPS tests (subset of trunk) label Oct 16, 2023
Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice!
Just small small update for the public API check but SGTM otherwise

@kurtamohler kurtamohler force-pushed the fill_uninitialized_memory branch from dfd004f to be5ee29 Compare October 16, 2023 20:26
@janeyx99 janeyx99 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Oct 16, 2023
@kurtamohler kurtamohler force-pushed the fill_uninitialized_memory branch from be5ee29 to 7678635 Compare October 17, 2023 14:46
Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@kurtamohler
Copy link
Collaborator Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 25, 2023
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: Command git -C /home/runner/work/pytorch/pytorch rebase origin/main returned non-zero exit code 1

Rebasing (1/1)
Auto-merging torch/utils/__init__.py
CONFLICT (content): Merge conflict in torch/utils/__init__.py
error: could not apply fee9e3f5f92... Add `torch.utils.deterministic.fill_uninitialized_memory` flag (#111377)
hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
Could not apply fee9e3f5f92... Add `torch.utils.deterministic.fill_uninitialized_memory` flag (#111377)
Details for Dev Infra team Raised by workflow job

@kurtamohler kurtamohler force-pushed the fill_uninitialized_memory branch from 7678635 to a0731c0 Compare October 25, 2023 22:10
@kurtamohler
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@aaronenyeshi
Copy link
Member

This is causing test/profiler/test_profiler.py: test_tensor_properties to timeout and fail internally. Any chance you can help take a look? Thank you.

https://github.com/pytorch/pytorch/blob/main/test/profiler/test_profiler.py#L2308

@facebook-github-bot
Copy link
Contributor

@pytorchbot revert -m="Diff reverted internally" -c="ghfirst"

This Pull Request has been reverted by a revert inside Meta. To re-land this change, please open another pull request, assign the same reviewers, fix the CI failures that caused the revert and make sure that the failing CI runs on the PR by applying the proper ciflow label (e.g., ciflow/trunk).)

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

@pytorchmergebot
Copy link
Collaborator

@kurtamohler your PR has been successfully reverted.

pytorchmergebot added a commit that referenced this pull request Oct 29, 2023
@kurtamohler
Copy link
Collaborator Author

Thanks for letting me know, I will take a look this week

@kurtamohler
Copy link
Collaborator Author

kurtamohler commented Oct 31, 2023

@aaronenyeshi, when I run python test/profiler/test_profiler.py -k test_tensor_properties using the changes from this PR, I don't get any timeout. In fact, it seems that none of the tests in test/profiler/test_profiler.py even turn deterministic mode on.

I tried adding torch.use_deterministic_algorithms(True) to the top of the test/profiler/test_profiler.py file after the imports, but that didn't cause any timeouts on my machine either. However, it did cause a few of the other tests in that file (but not the test_tensor_properties test itself) to fail due to a different pattern of memory allocations in deterministic vs nondeterministic mode. Those failing tests explicitly check memory allocations against an expected pattern.

Are you sure that this PR is really the reason for the internal failure? Maybe it's just a flaky failure. Or maybe the internal testing infrastructure is doing something to turn deterministic mode on before running some of the profiler tests. Could you verify that deterministic mode is actually turned on when the internal CI runs the test_tensor_properties test? If the test is not being run with deterministic mode turned on, I'm not sure how this PR could be the cause of the failure

@albanD albanD reopened this Oct 31, 2023
@aaronenyeshi
Copy link
Member

aaronenyeshi commented Nov 1, 2023

Sorry, I think its unrelated, and the automatic bisect flagged this PR. It looks like that test is still having timeout issues. See D50763398 . Please feel free to re-land this diff. Sorry for the inconvenience!

Also, it seems that its failing only in dev-tsan mode, rather than opt or dev modes. https://www.internalfb.com/intern/test/281475086418213/

kurtamohler added a commit to kurtamohler/pytorch that referenced this pull request Nov 1, 2023
@kurtamohler
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@kurtamohler
Copy link
Collaborator Author

kurtamohler commented Nov 1, 2023

Huh, for some reason CI did not rerun after I rebased, so the merge command merged this immediately. I guess it's probably ok, since this PR passed CI before

@huydhn
Copy link
Contributor

huydhn commented Nov 1, 2023

@pytorchbot drci

(please ignore this comment, I'm testing Dr.CI)

@kit1980 kit1980 removed the Reverted label Nov 2, 2023
xuhancn pushed a commit to xuhancn/pytorch that referenced this pull request Nov 7, 2023
xuhancn pushed a commit to xuhancn/pytorch that referenced this pull request Nov 7, 2023
Skylion007 pushed a commit to Skylion007/pytorch that referenced this pull request Nov 14, 2023
Skylion007 pushed a commit to Skylion007/pytorch that referenced this pull request Nov 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/mps Run MPS tests (subset of trunk) ciflow/trunk Trigger trunk jobs on your pull request Merged module: determinism open source release notes: python_frontend python frontend release notes category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants