KEMBAR78

Add skip_first_wait to profiler.schedule (V2) by sraikund16 · Pull Request #141512 · pytorch/pytorch · GitHub

Add skip_first_wait to profiler.schedule (V2) #141512

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Closed

sraikund16 wants to merge 1 commit into pytorch:main from sraikund16:export-D66465860

Contributor

sraikund16 commented Nov 25, 2024 •

edited

Loading

Summary:
Another try for D66198138. Original diff had some weird issue with type checking. Setting everything to int this time to get around it.

Addresses #91888
We use wait as the amount you wait in between cycles when profiling and skip_first to delay the start of said profiling. However, once skip_first steps are completed, we immediately go to the wait phase. This is not problematic if wait is smaller than skip_first because we can just lower the values of skip_first, but if it is larger then we end up starting the first profile much later than desired. For example imagine a skip first of 1 and a wait of 100 with repeat of 2. We do want to wait 100 steps in between cycle 1 and 2 but we may not want to start warmup of cycle 1 at step 101 (forced because wait occurs directly after first steps skipped). This diff addresses this by adding a flag to skip the first wait.
Adds new flag but sets to false by default so that existing impl is not affected.

Test Plan:
Got following traces with this schedule:
schedule=torch.profiler.schedule(
wait=10, warmup=3, active=1, repeat=1, skip_first=1, skip_first_wait=1
)

Differential Revision: D66465860

pytorch-bot bot commented Nov 25, 2024 •

edited

Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/141512

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 63136ab with merge base 6d4cd3e ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Contributor

facebook-github-bot commented Nov 25, 2024

This pull request was exported from Phabricator. Differential Revision: D66465860

facebook-github-bot added the fb-exported label

sraikund16 requested review from aaronenyeshi, briancoutinho and sanrise

November 25, 2024 19:59

sraikund16 self-assigned this

sraikund16 added topic: improvements release notes: profiler ciflow/trunk labels

aaronenyeshi approved these changes

View reviewed changes

Contributor

facebook-github-bot commented Nov 25, 2024

This pull request was exported from Phabricator. Differential Revision: D66465860

sraikund16 force-pushed the export-D66465860 branch from a03a643 to 00bd4b3 Compare

November 26, 2024 00:17

sraikund16 added a commit to sraikund16/pytorch that referenced this pull request


          Add skip_first_wait to profiler.schedule (V2) (pytorch#141512)

00bd4b3

Summary:

Another try for D66198138. Original diff had some weird issue with type checking. Setting everything to int this time to get around it.

Addresses pytorch#91888
We use wait as the amount you wait in between cycles when profiling and skip_first to delay the start of said profiling. However, once skip_first steps are completed, we immediately go to the wait phase. This is not problematic if wait is smaller than skip_first because we can just lower the values of skip_first, but if it is larger then we end up starting the first profile much later than desired. For example imagine a skip first of 1 and a wait of 100 with repeat of 2. We do want to wait 100 steps in between cycle 1 and 2 but we may not want to start warmup of cycle 1 at step 101 (forced because wait occurs directly after first steps skipped). This diff addresses this by adding a flag to skip the first wait.
Adds new flag but sets to false by default so that existing impl is not affected.

Test Plan:
Got following traces with this schedule:
schedule=torch.profiler.schedule(
          wait=10, warmup=3, active=1, repeat=1, skip_first=1, skip_first_wait=1
      )
https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/devvm2185.cco0.facebook.com/rank-0.Nov_19_14_38_09.417495.pt.trace.json.gz&bucket=gpu_traces
https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/devvm2185.cco0.facebook.com/rank-0.Nov_19_14_29_32.337974.pt.trace.json.gz&bucket=gpu_traces

Reviewed By: aaronenyeshi

Differential Revision: D66465860

Contributor

facebook-github-bot commented Nov 26, 2024

This pull request was exported from Phabricator. Differential Revision: D66465860

sraikund16 force-pushed the export-D66465860 branch from 00bd4b3 to 008c2e2 Compare

November 26, 2024 04:31

Contributor

facebook-github-bot commented Nov 26, 2024

This pull request was exported from Phabricator. Differential Revision: D66465860


          Add skip_first_wait to profiler.schedule (V2) (pytorch#141512)

63136ab

Summary:

Another try for D66198138. Original diff had some weird issue with type checking. Setting everything to int this time to get around it.

Addresses pytorch#91888
We use wait as the amount you wait in between cycles when profiling and skip_first to delay the start of said profiling. However, once skip_first steps are completed, we immediately go to the wait phase. This is not problematic if wait is smaller than skip_first because we can just lower the values of skip_first, but if it is larger then we end up starting the first profile much later than desired. For example imagine a skip first of 1 and a wait of 100 with repeat of 2. We do want to wait 100 steps in between cycle 1 and 2 but we may not want to start warmup of cycle 1 at step 101 (forced because wait occurs directly after first steps skipped). This diff addresses this by adding a flag to skip the first wait.
Adds new flag but sets to false by default so that existing impl is not affected.

Test Plan:
Got following traces with this schedule:
schedule=torch.profiler.schedule(
          wait=10, warmup=3, active=1, repeat=1, skip_first=1, skip_first_wait=1
      )
https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/devvm2185.cco0.facebook.com/rank-0.Nov_19_14_38_09.417495.pt.trace.json.gz&bucket=gpu_traces
https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/devvm2185.cco0.facebook.com/rank-0.Nov_19_14_29_32.337974.pt.trace.json.gz&bucket=gpu_traces

Reviewed By: aaronenyeshi

Differential Revision: D66465860

sraikund16 force-pushed the export-D66465860 branch from 008c2e2 to 63136ab Compare

November 26, 2024 04:33

Contributor

facebook-github-bot commented Nov 26, 2024

This pull request was exported from Phabricator. Differential Revision: D66465860

1 similar comment

Contributor

facebook-github-bot commented Nov 26, 2024

This pull request was exported from Phabricator. Differential Revision: D66465860

Contributor

facebook-github-bot commented Nov 26, 2024

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

pytorchmergebot added the merging label

Collaborator

pytorchmergebot commented Nov 26, 2024

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot added the Merged label

pytorchmergebot closed this in

29ca448

pytorchmergebot removed the merging label

pobin6 pushed a commit to pobin6/pytorch that referenced this pull request


          Add skip_first_wait to profiler.schedule (V2) (pytorch#141512)

3c0b6e1

Summary:
Another try for D66198138. Original diff had some weird issue with type checking. Setting everything to int this time to get around it.

Addresses pytorch#91888
We use wait as the amount you wait in between cycles when profiling and skip_first to delay the start of said profiling. However, once skip_first steps are completed, we immediately go to the wait phase. This is not problematic if wait is smaller than skip_first because we can just lower the values of skip_first, but if it is larger then we end up starting the first profile much later than desired. For example imagine a skip first of 1 and a wait of 100 with repeat of 2. We do want to wait 100 steps in between cycle 1 and 2 but we may not want to start warmup of cycle 1 at step 101 (forced because wait occurs directly after first steps skipped). This diff addresses this by adding a flag to skip the first wait.
Adds new flag but sets to false by default so that existing impl is not affected.

Test Plan:
Got following traces with this schedule:
schedule=torch.profiler.schedule(
          wait=10, warmup=3, active=1, repeat=1, skip_first=1, skip_first_wait=1
      )

Differential Revision: D66465860

Pull Request resolved: pytorch#141512
Approved by: https://github.com/aaronenyeshi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk fb-exported Merged release notes: profiler topic: improvements