-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Add skip_first_wait to profiler.schedule (V2) #141512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/141512
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 63136ab with merge base 6d4cd3e ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
This pull request was exported from Phabricator. Differential Revision: D66465860 |
|
This pull request was exported from Phabricator. Differential Revision: D66465860 |
a03a643 to
00bd4b3
Compare
Summary: Another try for D66198138. Original diff had some weird issue with type checking. Setting everything to int this time to get around it. Addresses pytorch#91888 We use wait as the amount you wait in between cycles when profiling and skip_first to delay the start of said profiling. However, once skip_first steps are completed, we immediately go to the wait phase. This is not problematic if wait is smaller than skip_first because we can just lower the values of skip_first, but if it is larger then we end up starting the first profile much later than desired. For example imagine a skip first of 1 and a wait of 100 with repeat of 2. We do want to wait 100 steps in between cycle 1 and 2 but we may not want to start warmup of cycle 1 at step 101 (forced because wait occurs directly after first steps skipped). This diff addresses this by adding a flag to skip the first wait. Adds new flag but sets to false by default so that existing impl is not affected. Test Plan: Got following traces with this schedule: schedule=torch.profiler.schedule( wait=10, warmup=3, active=1, repeat=1, skip_first=1, skip_first_wait=1 ) https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/devvm2185.cco0.facebook.com/rank-0.Nov_19_14_38_09.417495.pt.trace.json.gz&bucket=gpu_traces https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/devvm2185.cco0.facebook.com/rank-0.Nov_19_14_29_32.337974.pt.trace.json.gz&bucket=gpu_traces Reviewed By: aaronenyeshi Differential Revision: D66465860
|
This pull request was exported from Phabricator. Differential Revision: D66465860 |
00bd4b3 to
008c2e2
Compare
|
This pull request was exported from Phabricator. Differential Revision: D66465860 |
Summary: Another try for D66198138. Original diff had some weird issue with type checking. Setting everything to int this time to get around it. Addresses pytorch#91888 We use wait as the amount you wait in between cycles when profiling and skip_first to delay the start of said profiling. However, once skip_first steps are completed, we immediately go to the wait phase. This is not problematic if wait is smaller than skip_first because we can just lower the values of skip_first, but if it is larger then we end up starting the first profile much later than desired. For example imagine a skip first of 1 and a wait of 100 with repeat of 2. We do want to wait 100 steps in between cycle 1 and 2 but we may not want to start warmup of cycle 1 at step 101 (forced because wait occurs directly after first steps skipped). This diff addresses this by adding a flag to skip the first wait. Adds new flag but sets to false by default so that existing impl is not affected. Test Plan: Got following traces with this schedule: schedule=torch.profiler.schedule( wait=10, warmup=3, active=1, repeat=1, skip_first=1, skip_first_wait=1 ) https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/devvm2185.cco0.facebook.com/rank-0.Nov_19_14_38_09.417495.pt.trace.json.gz&bucket=gpu_traces https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/devvm2185.cco0.facebook.com/rank-0.Nov_19_14_29_32.337974.pt.trace.json.gz&bucket=gpu_traces Reviewed By: aaronenyeshi Differential Revision: D66465860
008c2e2 to
63136ab
Compare
|
This pull request was exported from Phabricator. Differential Revision: D66465860 |
1 similar comment
|
This pull request was exported from Phabricator. Differential Revision: D66465860 |
|
@pytorchbot merge (Initiating merge automatically since Phabricator Diff has merged) |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Summary: Another try for D66198138. Original diff had some weird issue with type checking. Setting everything to int this time to get around it. Addresses pytorch#91888 We use wait as the amount you wait in between cycles when profiling and skip_first to delay the start of said profiling. However, once skip_first steps are completed, we immediately go to the wait phase. This is not problematic if wait is smaller than skip_first because we can just lower the values of skip_first, but if it is larger then we end up starting the first profile much later than desired. For example imagine a skip first of 1 and a wait of 100 with repeat of 2. We do want to wait 100 steps in between cycle 1 and 2 but we may not want to start warmup of cycle 1 at step 101 (forced because wait occurs directly after first steps skipped). This diff addresses this by adding a flag to skip the first wait. Adds new flag but sets to false by default so that existing impl is not affected. Test Plan: Got following traces with this schedule: schedule=torch.profiler.schedule( wait=10, warmup=3, active=1, repeat=1, skip_first=1, skip_first_wait=1 ) Differential Revision: D66465860 Pull Request resolved: pytorch#141512 Approved by: https://github.com/aaronenyeshi
Summary:
Another try for D66198138. Original diff had some weird issue with type checking. Setting everything to int this time to get around it.
Addresses #91888
We use wait as the amount you wait in between cycles when profiling and skip_first to delay the start of said profiling. However, once skip_first steps are completed, we immediately go to the wait phase. This is not problematic if wait is smaller than skip_first because we can just lower the values of skip_first, but if it is larger then we end up starting the first profile much later than desired. For example imagine a skip first of 1 and a wait of 100 with repeat of 2. We do want to wait 100 steps in between cycle 1 and 2 but we may not want to start warmup of cycle 1 at step 101 (forced because wait occurs directly after first steps skipped). This diff addresses this by adding a flag to skip the first wait.
Adds new flag but sets to false by default so that existing impl is not affected.
Test Plan:
Got following traces with this schedule:
schedule=torch.profiler.schedule(
wait=10, warmup=3, active=1, repeat=1, skip_first=1, skip_first_wait=1
)
Differential Revision: D66465860