Set dropout in SDPA to 0.0 when not in training mode #1803

ebsmothers · 2024-10-10T19:12:40Z

As pointed out by @zjost in #1791, we probably should just manually force dropout to be 0 outside of training. This is actually what we did originally, but along the way we switched to just always using the value from attn_dropout directly, which will not be correct at inference time if someone passes a nonzero value. So this PR changes back to coercing dropout to 0.0 outside of training mode. I checked with the author of the PR who first dropped the if/else dropout logic and it was not done by design.

I also make two other changes: (1) reverting the doc update I made last night, and (2) raising an error in case someone tries to use FlexAttention with nonzero dropout (it's not currently supported).

pytorch-bot · 2024-10-10T19:12:47Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1803

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit eb9a317 with merge base 5de5001 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ebsmothers added 3 commits October 10, 2024 11:36

Set dropout in SDPA to 0.0 when not in training mode

813de44

Add warning about dropout in flex attention

c344cd8

Raise, don't log warning

eb9a317

ebsmothers requested a review from RdoubleA October 10, 2024 19:12

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 10, 2024

RdoubleA approved these changes Oct 10, 2024

View reviewed changes

ebsmothers merged commit 665ab3f into meta-pytorch:main Oct 10, 2024
17 checks passed

ebsmothers deleted the disable-sdpa-dropout-for-inference branch October 10, 2024 19:53

zjost mentioned this pull request Oct 10, 2024

Dropout not handled properly in scaled_dot_product_attention? #1791

Closed

mori360 pushed a commit to mori360/torchtune that referenced this pull request Oct 14, 2024

Set dropout in SDPA to 0.0 when not in training mode (meta-pytorch#1803)

66823db

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Set dropout in SDPA to 0.0 when not in training mode #1803

Set dropout in SDPA to 0.0 when not in training mode #1803

Uh oh!

ebsmothers commented Oct 10, 2024

Uh oh!

pytorch-bot bot commented Oct 10, 2024 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Set dropout in SDPA to 0.0 when not in training mode #1803

Set dropout in SDPA to 0.0 when not in training mode #1803

Uh oh!

Conversation

ebsmothers commented Oct 10, 2024

Uh oh!

pytorch-bot bot commented Oct 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1803

✅ No Failures

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pytorch-bot bot commented Oct 10, 2024 •

edited

Loading