-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[PP] Support OVERLAP_F_B computation type #158978
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/158978
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ✅ You can merge normally! (1 Unrelated Failure)As of commit 1943aac with merge base e4b123b ( UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
| # Check if action has sub_actions | ||
| if action.sub_actions is not None: | ||
| # Process each sub_action instead of the main action | ||
| for sub_action in action.sub_actions: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the idea here that you first call F for one stage/mb, and then call B for the other stage/mb?
If so, how do you ensure that they are able to overlap? I thought the point of OVERLAP_F_B action w~as to call 1 user function, where inside the user function they take care of properly overlapping things.
edit: sorry, missed that this is validate_schedule. i think this is a good way to do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inside the user function they take care of properly overlapping things.
Yep, that was the plan. The custom user function is not implemented in this PR yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this looks like its on the right track, but a couple things i'd like to see
-
examples of how the schedule would be authored (is the IR notation ergonomic? iiuc it has commas inside the op's text. Please add some parsing tests that also illustrate how this looks
-
how is this going to be connected to the executor/runtime? Without seeing this piece i'm a little hesitant to land the scaffolding part.
Some changes to validation code and visualizer to support a new computation type that will be used in DualPipeV (see pytorch/torchtitan#1447) cc awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta [ghstack-poisoned]
Some changes to validation code and visualizer to support a new computation type that will be used in DualPipeV (see pytorch/torchtitan#1447) cc awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta [ghstack-poisoned]
Some changes to validation code and visualizer to support a new computation type that will be used in DualPipeV (see pytorch/torchtitan#1447) cc awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta [ghstack-poisoned]
Some changes to validation code and visualizer to support a new computation type that will be used in DualPipeV (see pytorch/torchtitan#1447) cc awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta [ghstack-poisoned]
DualPipeV is generated through #159591, but yes if users want to create it strictly from the IR its not well documented. Current it looks like this: I'm thinking of just having it be
Right now DualPipeV is using the runtime. The workaround for OVERLAP since we dont have it implemented is to treat it as separate Forward and backward ops. This is working and able to run, the next steps would be to register a custom function for this operation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, please add your IR snippet to the PR desc for future reference but i think thats a usable format.
Some changes to validation code and visualizer to support a new computation type that will be used in DualPipeV (see #159591) The IR looks like: ``` [0F0, 0F1, 0F2, 0F3, 0F4, 0F5, 0F6, 7F0, 7I0, 7W0, 7F1, 7I1, 7W1, 7F2, 7I2, 7W2, 7F3, (0F7;7B3)OVERLAP_F_B, (7F4;0B0)OVERLAP_F_B, (0F8;7B4)OVERLAP_F_B, (7F5;0B1)OVERLAP_F_B, (0F9;7B5)OVERLAP_F_B, (7F6;0B2)OVERLAP_F_B, 7B6, (7F7;0B3)OVERLAP_F_B, 7B7, (7F8;0B4)OVERLAP_F_B, 7B8, (7F9;0B5)OVERLAP_F_B, 7B9, 0I6, 0W6, 0I7, 0W7, 0I8, 0W8, 0I9, 0W9] [1F0, 1F1, 1F2, 1F3, 1F4, 6F0, 1F5, 6F1, 6I0, 6W0, 6F2, 6I1, 6W1, 6F3, (1F6;6B2)OVERLAP_F_B, (6F4;1B0)OVERLAP_F_B, (1F7;6B3)OVERLAP_F_B, (6F5;1B1)OVERLAP_F_B, (1F8;6B4)OVERLAP_F_B, (6F6;1B2)OVERLAP_F_B, (1F9;6B5)OVERLAP_F_B, (6F7;1B3)OVERLAP_F_B, 6B6, (6F8;1B4)OVERLAP_F_B, 6B7, (6F9;1B5)OVERLAP_F_B, 6B8, 1B6, 6I9, 1I7, 6W9, 1I8, 1W7, 1I9, 1W8, 1W9] [2F0, 2F1, 2F2, 5F0, 2F3, 5F1, 2F4, 5F2, 5I0, 5W0, 5F3, (2F5;5B1)OVERLAP_F_B, (5F4;2B0)OVERLAP_F_B, (2F6;5B2)OVERLAP_F_B, (5F5;2B1)OVERLAP_F_B, (2F7;5B3)OVERLAP_F_B, (5F6;2B2)OVERLAP_F_B, (2F8;5B4)OVERLAP_F_B, (5F7;2B3)OVERLAP_F_B, (2F9;5B5)OVERLAP_F_B, (5F8;2B4)OVERLAP_F_B, 5B6, (5F9;2B5)OVERLAP_F_B, 5B7, 2B6, 5B8, 2I7, 5I9, 2I8, 2W7, 2I9, 5W9, 2W8, 2W9] [3F0, 4F0, 3F1, 4F1, 3F2, 4F2, 3F3, 4F3, 3F4, 4B0, (4F4;3B0)OVERLAP_F_B, (3F5;4B1)OVERLAP_F_B, (4F5;3B1)OVERLAP_F_B, (3F6;4B2)OVERLAP_F_B, (4F6;3B2)OVERLAP_F_B, (3F7;4B3)OVERLAP_F_B, (4F7;3B3)OVERLAP_F_B, (3F8;4B4)OVERLAP_F_B, (4F8;3B4)OVERLAP_F_B, (3F9;4B5)OVERLAP_F_B, (4F9;3B5)OVERLAP_F_B, 4B6, 3B6, 4B7, 3B7, 4I8, 3I8, 4I9, 3I9, 4W8, 3W8, 4W9, 3W9] ``` In this PR, the schedule execution will just treat the OVERLAP_F_B as two separate operations of F and B (so there is no actual overlap). The next step is to allow users to create a custom function to plug in what this operation does. https://github.com/pytorch/pytorch/blob/814629043a0c31441bc3749204c97f1e24fa3462/torch/distributed/pipelining/schedules.py#L1205-L1216 cc awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta [ghstack-poisoned]
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Stack from ghstack (oldest at bottom):
Some changes to validation code and visualizer to support a new computation type that will be used in DualPipeV (see #159591)
The IR looks like:
In this PR, the schedule execution will just treat the OVERLAP_F_B as two separate operations of F and B (so there is no actual overlap). The next step is to allow users to create a custom function to plug in what this operation does.
pytorch/torch/distributed/pipelining/schedules.py
Lines 1205 to 1216 in 8146290
cc @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta