-
Notifications
You must be signed in to change notification settings - Fork 25.7k
NJT <-> padded dense conversions #125947
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NJT <-> padded dense conversions #125947
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/125947
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (1 Unrelated Failure)As of commit 6b9f037 with merge base bc1b8f0 ( FLAKY - The following job failed but was likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
Related old discussion on this being a useful primitive (in the old days for collate_fn of data loading) and deserving more fame :) One useful thing here is also to support "padding multiples" per dimension |
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
|
Maybe one way to auto-construct NJT from torch.stack([...]) call in default collate could be:
like this the collate_fn code could be kept unchanged, but if the inputs are wrapped as NJT, it would start to produce a NJT... |
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
|
@pytorchbot revert -m 'Sorry for reverting your change but it is failing dynamo test https://hud.pytorch.org/pytorch/pytorch/commit/09a5e88bef04d5485b70d8f65f46a675aaa52942, maybe a landrace' -c landrace |
|
@pytorchbot successfully started a revert job. Check the current status here. |
|
@jbschlosser your PR has been successfully reverted. |
This reverts commit 09a5e88. Reverted #125947 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing dynamo test https://hud.pytorch.org/pytorch/pytorch/commit/09a5e88bef04d5485b70d8f65f46a675aaa52942, maybe a landrace ([comment](#125947 (comment)))
Attention! native_functions.yaml was changedIf you are adding a new function or defaulted argument to native_functions.yaml, you cannot use it from pre-existing Python frontend code until our FC window passes (two weeks). Split your PR into two PRs, one which adds the new C++ functionality, and one that makes use of it from Python, and land them two weeks apart. See https://github.com/pytorch/pytorch/wiki/PyTorch's-Python-Frontend-Backward-and-Forward-Compatibility-Policy#forwards-compatibility-fc for more info. Caused by: |
This PR:
* Implements the pre-existing `nt.to_padded_tensor(padding_val)` ATen op via the FBGEMM kernel + appropriate view gymnastics (since that kernel only handles 2D values)
* Introduces a new `_nested_from_padded_tensor` op for the reverse conversion, implemented via the reverse FBGEMM kernel + view gymnastics
* Note: there is currently no public API for this; design booted to a future PR
TODO:
* ~~Propagate min / max sequence length via the new factory function `_nested_from_padded_tensor`~~
* ~~Verify that Inductor does computation fusion via test logic~~
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec
[ghstack-poisoned]
|
@pytorchbot merge |
|
Changing the meta registration for |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
`rms_norm()` is a nice-to-have for ViT :) This PR: * SymInt-ifies `rms_norm()`, allowing NJT to use the same decomp. * Adds torch_function-based input validation logic for nested-specific stuff (no normalization supported over the ragged dim for now) on the python NJT side. * Adds multi-dim support (on non-ragged, non-batch dims) to `mean()` for NJT. Pull Request resolved: #135872 Approved by: https://github.com/mikaylagawarecki ghstack dependencies: #125947
This PR:
* Implements the pre-existing `nt.to_padded_tensor(padding_val)` ATen op via the FBGEMM kernel + appropriate view gymnastics (since that kernel only handles 2D values)
* Introduces a new `_nested_from_padded_tensor` op for the reverse conversion, implemented via the reverse FBGEMM kernel + view gymnastics
* Note: there is currently no public API for this; design booted to a future PR
TODO:
* ~~Propagate min / max sequence length via the new factory function `_nested_from_padded_tensor`~~
* ~~Verify that Inductor does computation fusion via test logic~~
Pull Request resolved: pytorch#125947
Approved by: https://github.com/soulitzer
This reverts commit 09a5e88. Reverted pytorch#125947 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing dynamo test https://hud.pytorch.org/pytorch/pytorch/commit/09a5e88bef04d5485b70d8f65f46a675aaa52942, maybe a landrace ([comment](pytorch#125947 (comment)))
This PR:
* Implements the pre-existing `nt.to_padded_tensor(padding_val)` ATen op via the FBGEMM kernel + appropriate view gymnastics (since that kernel only handles 2D values)
* Introduces a new `_nested_from_padded_tensor` op for the reverse conversion, implemented via the reverse FBGEMM kernel + view gymnastics
* Note: there is currently no public API for this; design booted to a future PR
TODO:
* ~~Propagate min / max sequence length via the new factory function `_nested_from_padded_tensor`~~
* ~~Verify that Inductor does computation fusion via test logic~~
Pull Request resolved: pytorch#125947
Approved by: https://github.com/soulitzer
`rms_norm()` is a nice-to-have for ViT :) This PR: * SymInt-ifies `rms_norm()`, allowing NJT to use the same decomp. * Adds torch_function-based input validation logic for nested-specific stuff (no normalization supported over the ragged dim for now) on the python NJT side. * Adds multi-dim support (on non-ragged, non-batch dims) to `mean()` for NJT. Pull Request resolved: pytorch#135872 Approved by: https://github.com/mikaylagawarecki ghstack dependencies: pytorch#125947
ghstack-source-id: df642b5 Pull Request resolved: pytorch/pytorch#125947
Stack from ghstack (oldest at bottom):
This PR:
nt.to_padded_tensor(padding_val)ATen op via the FBGEMM kernel + appropriate view gymnastics (since that kernel only handles 2D values)_nested_from_padded_tensorop for the reverse conversion, implemented via the reverse FBGEMM kernel + view gymnasticsTODO:
Propagate min / max sequence length via the new factory function_nested_from_padded_tensorVerify that Inductor does computation fusion via test logiccc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @amjames @rec