KEMBAR78
[OSS] Enable Flight Recorder buffer for all by c-p-i-o · Pull Request #142260 · pytorch/pytorch · GitHub
Skip to content

Conversation

@c-p-i-o
Copy link
Contributor

@c-p-i-o c-p-i-o commented Dec 6, 2024

Summary: Enable collecting Flight Recorder data for all.

Test Plan: This has been rolled out internally for a while now.

Differential Revision: D66897635

cc @H-Huang @awgu @kwen2501 @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k

@pytorch-bot
Copy link

pytorch-bot bot commented Dec 6, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/142260

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 8b44b03 with merge base 7597ab6 (image):

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (c10d) release notes category labels Dec 6, 2024
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D66897635

@kwen2501
Copy link
Contributor

kwen2501 commented Dec 6, 2024

Just for our record, with this flag turned on, we are still leaving the following flags off:

  1. ENABLE_TIMING
  2. DUMP_ON_TIMEOUT

@c-p-i-o does that reflect the case?

@kwen2501 kwen2501 self-requested a review December 6, 2024 21:34
Copy link
Contributor

@kwen2501 kwen2501 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Thanks for turning it on!

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 6, 2024
@netlify
Copy link

netlify bot commented Dec 6, 2024

Deploy Preview for chimerical-cranachan-793287 ready!

Name Link
🔨 Latest commit 8b44b03
🔍 Latest deploy log https://app.netlify.com/sites/chimerical-cranachan-793287/deploys/67537b1f9caeb1000860c5df
😎 Deploy Preview https://deploy-preview-142260--chimerical-cranachan-793287.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link
Contributor

@fduwjj fduwjj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah!!

@c-p-i-o
Copy link
Contributor Author

c-p-i-o commented Dec 6, 2024

Just for our record, with this flag turned on, we are still leaving the following flags off:

  1. ENABLE_TIMING
  2. DUMP_ON_TIMEOUT

@c-p-i-o does that reflect the case?

Yup - we're not touching ENABLE_TIMING and DUMP_ON_TIMEOUT with this change @kwen2501

Summary:

Enable collecting Flight Recorder data for all.

Test Plan: This has been rolled out internally for a while now.

Reviewed By: kwen2501

Differential Revision: D66897635
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D66897635

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D66897635

@c-p-i-o c-p-i-o self-assigned this Dec 6, 2024
@facebook-github-bot
Copy link
Contributor

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

AmdSampsa pushed a commit to AmdSampsa/pytorch that referenced this pull request Dec 9, 2024
Summary: Enable collecting Flight Recorder data for all.

Test Plan: This has been rolled out internally for a while now.

Differential Revision: D66897635

Pull Request resolved: pytorch#142260
Approved by: https://github.com/kwen2501, https://github.com/fduwjj, https://github.com/wconstab
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request fb-exported Merged oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (c10d) release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants