KEMBAR78
[decomp] Fix _scaled_dot_product_flash_attention decomposition bug by larryliu0820 · Pull Request #113102 · pytorch/pytorch · GitHub
Skip to content

Conversation

@larryliu0820
Copy link
Contributor

@larryliu0820 larryliu0820 commented Nov 7, 2023

Stack from ghstack (oldest at bottom):

For _scaled_dot_product_flash_attention we don't have

Tensor? attn_mask=None

but scaled_dot_product_attention has. In the original decomp there's a
mixup where I added this argument to
_scaled_dot_product_flash_attention.

Fix it so that _scaled_dot_product_flash_attention is being decomposed correctly.

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

For `F.scaled_dot_product_attention` we have a positional argument:

`Tensor? attn_mask=None`

This is not handled correctly in `python_variables.cpp`, and it is
causing issue in tracing.

Fix it so that `is_default` lambda  correctly returns true when the default value is None

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Nov 7, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/113102

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit f7a6836 with merge base bdfde62 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

larryliu0820 added a commit that referenced this pull request Nov 7, 2023
For `F.scaled_dot_product_attention` we have a positional argument:

`Tensor? attn_mask=None`

This is not handled correctly in `python_variables.cpp`, and it is
causing issue in tracing.

Fix it so that `is_default` lambda  correctly returns true when the default value is None

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 5a280b4
Pull Request resolved: #113102
@larryliu0820 larryliu0820 changed the title Fix a small bug for positional argument default value being None [decomp] Fix _scaled_dot_product_flash_attention decomposition bug Nov 7, 2023
…tion bug"

For `_scaled_dot_product_flash_attention` we don't have

`Tensor? attn_mask=None`

but `scaled_dot_product_attention` has. In the original decomp there's a
mixup where I added this argument to
`_scaled_dot_product_flash_attention`.

Fix it so that `_scaled_dot_product_flash_attention` is being decomposed correctly.

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
…tion bug"

For `_scaled_dot_product_flash_attention` we don't have

`Tensor? attn_mask=None`

but `scaled_dot_product_attention` has. In the original decomp there's a
mixup where I added this argument to
`_scaled_dot_product_flash_attention`.

Fix it so that `_scaled_dot_product_flash_attention` is being decomposed correctly.

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
larryliu0820 added a commit that referenced this pull request Nov 7, 2023
For `_scaled_dot_product_flash_attention` we don't have

`Tensor? attn_mask=None`

but `scaled_dot_product_attention` has. In the original decomp there's a
mixup where I added this argument to
`_scaled_dot_product_flash_attention`.

Fix it so that `_scaled_dot_product_flash_attention` is being decomposed correctly.

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: bb643d2
Pull Request resolved: #113102
Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot!

@albanD albanD requested a review from drisspg November 8, 2023 18:55
@larryliu0820
Copy link
Contributor Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 8, 2023
@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team Raised by workflow job

@larryliu0820
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@facebook-github-bot facebook-github-bot deleted the gh/larryliu0820/43/head branch November 12, 2023 15:24
Skylion007 pushed a commit to Skylion007/pytorch that referenced this pull request Nov 14, 2023
…ytorch#113102)

For `_scaled_dot_product_flash_attention` we don't have

`Tensor? attn_mask=None`

but `scaled_dot_product_attention` has. In the original decomp there's a
mixup where I added this argument to
`_scaled_dot_product_flash_attention`.

Fix it so that `_scaled_dot_product_flash_attention` is being decomposed correctly.

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: pytorch#113102
Approved by: https://github.com/ezyang
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request Merged topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants