KEMBAR78
handle state tensors in training ir path by avikchaudhuri · Pull Request #137240 · pytorch/pytorch · GitHub
Skip to content

Conversation

@avikchaudhuri
Copy link
Contributor

Summary: We had attribute assignment detection and handling of registered buffer assignments when using aot_autograd, but not when using just make_fx. Fixed.

Test Plan: expanded coverage of test_state_tensors to use export instead of torch.export.export

Differential Revision: D63802576

@pytorch-bot
Copy link

pytorch-bot bot commented Oct 3, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/137240

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit a6338cd with merge base e80f47f (image):

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D63802576

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D63802576

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D63802576

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D63802576

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D63802576

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In training IR, we are following the convention that we don't return mutated buffers. Could we just write it as inplace update?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the source code has buf = new_value can we treat it the same as buf.copy_(new_value)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is ok as long as metadata etc remains the same (aka shape). I mean when we unlift, we just write them as copy_ anyways... So seems ok? What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here: Can we make it a helper function in aot_autograd and use it here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is going to be substantially different, adding copy nodes instead of mutating buffer. I'll dedup the other part.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make this a util function in aot_autograd.py and use that here? I think this logic also exist in aot_autograd.py?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly true, although there we are seeing buffers as Functional(Fake(...)) and here just Fake(...).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we filter out the other parts? What i am worried is this part is bit complicated to understand so it would be better if we have only place to change stuff. Maybe the util function can take what instance we want to assert on lol.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D63802576

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D63802576

Summary:
Pull Request resolved: pytorch#137240

We had attribute assignment detection and handling of registered buffer assignments when using `aot_autograd`, but not when using just `make_fx`. Fixed.

Test Plan: expanded coverage of `test_state_tensors` to use `export` instead of `torch.export.export`

Differential Revision: D63802576
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D63802576

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 4, 2024
@facebook-github-bot
Copy link
Contributor

@pytorchbot merge -f 'Landed internally'

(Initiating merge automatically since Phabricator Diff has merged, using force because this PR might not pass merge_rules.json but landed internally)

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants