KEMBAR78
Determine autograd engine ready queue based on InputMetadata instead of InputBuffer by soulitzer · Pull Request #135633 · pytorch/pytorch · GitHub
Skip to content

Conversation

@soulitzer
Copy link
Contributor

@soulitzer soulitzer commented Sep 10, 2024

Stack from ghstack (oldest at bottom):

Thanks @awgu for raising this issue and the small repro

From offline discussion with @albanD, in the case where a forward returns multiple outputs with different devices, we'd want to select the ready queue based on the device of the first one. Even though this is somewhat arbitrary, we prefer this over deciding which ready queue to push based on whichever input buffer's we happen to compute last, which can vary depending on more factors and thus be harder to reason about. This is in theory bc-breaking, but it seems unlikely that someone would depend on this behavior.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

@soulitzer soulitzer requested a review from albanD as a code owner September 10, 2024 23:22
@pytorch-bot
Copy link

pytorch-bot bot commented Sep 10, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/135633

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit ae8e3bc with merge base c7b0d4b (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

soulitzer added a commit that referenced this pull request Sep 10, 2024
…'s device"


Thanks awgu for raising this issue and the small repro

From offline discussion with albanD, in the case where a forward returns multiple outputs with different devices, we'd want to select the ready queue based on the device of the first one. Even though this is somewhat arbitrary, we prefer this over deciding which ready queue to push based on whichever input buffer's we happen to compute last, which can vary depending on more factors and thus be harder to reason about. This is in theory bc-breaking, but it seems unlikely that someone would depend on this behavior.

[ghstack-poisoned]
soulitzer added a commit that referenced this pull request Sep 11, 2024
@soulitzer soulitzer changed the title Determine autograd engine reqdyqueue based on first output's device Determine autograd engine ready queue based on first output's device Sep 11, 2024
if (is_ready) {
auto queue = ready_queue(cpu_ready_queue, input_buffer.device());
// NB: The first forward output's device determines the queue
auto queue = ready_queue(cpu_ready_queue, next.function->input_metadata(0).device());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to preserve the non-CPU device being prioritized.
Could you update this by passing the input metadatas to the InputBuffer struct and make the .device() method there use that instead of the current buffer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good catch!

…t's device"


Thanks awgu for raising this issue and the small repro

From offline discussion with albanD, in the case where a forward returns multiple outputs with different devices, we'd want to select the ready queue based on the device of the first one. Even though this is somewhat arbitrary, we prefer this over deciding which ready queue to push based on whichever input buffer's we happen to compute last, which can vary depending on more factors and thus be harder to reason about. This is in theory bc-breaking, but it seems unlikely that someone would depend on this behavior.

[ghstack-poisoned]
…t's device"


Thanks awgu for raising this issue and the small repro

From offline discussion with albanD, in the case where a forward returns multiple outputs with different devices, we'd want to select the ready queue based on the device of the first one. Even though this is somewhat arbitrary, we prefer this over deciding which ready queue to push based on whichever input buffer's we happen to compute last, which can vary depending on more factors and thus be harder to reason about. This is in theory bc-breaking, but it seems unlikely that someone would depend on this behavior.

[ghstack-poisoned]
@soulitzer soulitzer added the release notes: autograd release notes category label Oct 4, 2024
@soulitzer soulitzer changed the title Determine autograd engine ready queue based on first output's device Determine autograd engine ready queue based on InputMetadata instead of InputBuffer Oct 4, 2024
…ta instead of InputBuffer"


Thanks awgu for raising this issue and the small repro

From offline discussion with albanD, in the case where a forward returns multiple outputs with different devices, we'd want to select the ready queue based on the device of the first one. Even though this is somewhat arbitrary, we prefer this over deciding which ready queue to push based on whichever input buffer's we happen to compute last, which can vary depending on more factors and thus be harder to reason about. This is in theory bc-breaking, but it seems unlikely that someone would depend on this behavior.

[ghstack-poisoned]
soulitzer added a commit that referenced this pull request Oct 4, 2024
Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM !

@awgu
Copy link
Collaborator

awgu commented Oct 4, 2024

thanks for this @soulitzer !

@soulitzer
Copy link
Contributor Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 4, 2024
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@github-actions github-actions bot deleted the gh/soulitzer/327/head branch November 6, 2024 02:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged module: inductor release notes: autograd release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants