Flex + NJT: cross attention support #140723

jbschlosser · 2024-11-14T16:56:25Z

Stack from ghstack (oldest at bottom):

-> Flex + NJT: cross attention support #140723

Allows ragged structures for query and key+value sequence lengths to differ (i.e. supports cross attention for Flex + NJT).

Technically, this is BC-breaking thanks to arg renaming and positional arg reordering in create_nested_block_mask(), but Flex + NJT support isn't in a major release yet so I'm hoping we can just do it.

[ghstack-poisoned]

pytorch-bot · 2024-11-14T16:56:29Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/140723

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 2 Active SEVs

There are 2 currently active SEVs. If your PR is affected, please view them below:

✅ No Failures

As of commit 35d6575 with merge base 2675ef8 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Fixes #140598 Allows ragged structures for query and key+value to differ (i.e. supports cross attention for Flex + NJT). Technically, this is BC-breaking thanks to positional arg reordering in `create_nested_block_mask()`, but Flex + NJT support isn't in a major release yet so I'm hoping we can just do it. [ghstack-poisoned]

Fixes #140598 Allows ragged structures for query and key+value sequence lengths to differ (i.e. supports cross attention for Flex + NJT). Technically, this is BC-breaking thanks to arg renaming and positional arg reordering in `create_nested_block_mask()`, but Flex + NJT support isn't in a major release yet so I'm hoping we can just do it. [ghstack-poisoned]

ghstack-source-id: 28240de Pull Request resolved: #140723

drisspg

Looks good, I would turn on ROCM jobs just to be sure there aren't any failures

jbschlosser · 2024-11-18T16:20:41Z

@pytorchbot merge

pytorchmergebot · 2024-11-18T16:22:23Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-11-18T19:50:50Z

This PR (#140723) was merged in e80b1b2 but it is still open, likely due to a Github bug, so mergebot is closing it manually. If you think this is a mistake, please feel free to reopen and contact Dev Infra.

Fixes pytorch#140598 Allows ragged structures for query and key+value sequence lengths to differ (i.e. supports cross attention for Flex + NJT). Technically, this is BC-breaking thanks to arg renaming and positional arg reordering in `create_nested_block_mask()`, but Flex + NJT support isn't in a major release yet so I'm hoping we can just do it. Pull Request resolved: pytorch#140723 Approved by: https://github.com/drisspg

Flex + NJT: cross attention support

b31f9df

[ghstack-poisoned]

jbschlosser requested review from albanD and mikaylagawarecki as code owners November 14, 2024 16:56

jbschlosser added topic: improvements topic category release notes: nested tensor Changes that have a direct impact on nested tensors labels Nov 14, 2024

jbschlosser requested review from cpuhrsch, drisspg and soulitzer and removed request for albanD November 14, 2024 16:59

jbschlosser added the suppress-bc-linter Suppresses the failures of API backward-compatibility linter (Lint/bc_linter) label Nov 14, 2024

jbschlosser added a commit that referenced this pull request Nov 14, 2024

Flex + NJT: cross attention support

2808b4e

ghstack-source-id: 28240de Pull Request resolved: #140723

drisspg approved these changes Nov 15, 2024

View reviewed changes

jbschlosser added the ciflow/rocm Trigger "default" config CI on ROCm label Nov 18, 2024

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 18, 2024

pytorchmergebot added the merging label Nov 18, 2024

pytorchmergebot added the Merged label Nov 18, 2024

pytorchmergebot closed this Nov 18, 2024

pytorchmergebot removed the merging label Nov 18, 2024

jbschlosser mentioned this pull request Nov 19, 2024

NJT + Flex *cross* attention #140598

Closed

jbschlosser mentioned this pull request Nov 20, 2024

Fix NJT linear_backward() memory usage #141163

Closed

github-actions bot deleted the gh/jbschlosser/200/head branch December 19, 2024 02:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Flex + NJT: cross attention support #140723

Flex + NJT: cross attention support #140723

Uh oh!

jbschlosser commented Nov 14, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Nov 14, 2024 •

edited

Loading

Uh oh!

drisspg left a comment

Uh oh!

jbschlosser commented Nov 18, 2024

Uh oh!

pytorchmergebot commented Nov 18, 2024

Uh oh!

pytorchmergebot commented Nov 18, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Flex + NJT: cross attention support #140723

Flex + NJT: cross attention support #140723

Uh oh!

Conversation

jbschlosser commented Nov 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Nov 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/140723

❗ 2 Active SEVs

✅ No Failures

Uh oh!

drisspg left a comment

Choose a reason for hiding this comment

Uh oh!

jbschlosser commented Nov 18, 2024

Uh oh!

pytorchmergebot commented Nov 18, 2024

Merge started

Uh oh!

pytorchmergebot commented Nov 18, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jbschlosser commented Nov 14, 2024 •

edited

Loading

pytorch-bot bot commented Nov 14, 2024 •

edited

Loading