KEMBAR78
[FlexAttention] Fix another IMA with captured buffers by drisspg · Pull Request #141164 · pytorch/pytorch · GitHub
Skip to content

Conversation

@drisspg
Copy link
Contributor

@drisspg drisspg commented Nov 20, 2024

Stack from ghstack (oldest at bottom):

Summary

We have another IMA for captured buffers when we are the sequences are not divisible.

Running test before this commit:

========= Error: process didn't terminate successfully
========= Target application returned an error
========= ERROR SUMMARY: 447 errors
========= ERROR SUMMARY: 347 errors were not printed. Use --print-limit option to adjust the number of printed errors

And After

❯ CUDA_LAUNCH_BLOCKING=1 PYTORCH_NO_CUDA_MEMORY_CACHING=1 compute-sanitizer --tool memcheck pytest test/inductor/test_flex_attention.py -k "test_non_divisible_with_captured_buffer"
========= COMPUTE-SANITIZER
====================================================== test session starts =======================================================
platform linux -- Python 3.12.7, pytest-7.4.0, pluggy-1.5.0
rootdir: /home/drisspg/meta/pytorch
configfile: pytest.ini
plugins: hypothesis-6.115.5, typeguard-4.3.0
collected 518 items / 517 deselected / 1 selected                                                                                
Running 1 items in this shard

test/inductor/test_flex_attention.py .                                                                                     [100%]

=============================================== 1 passed, 517 deselected in 13.31s ===============================================
========= ERROR SUMMARY: 0 errors

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov @Chillee @yanboliang @BoyuanFeng

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Nov 20, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/141164

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 1 New Failure, 1 Unrelated Failure

As of commit 2180591 with merge base 2ee2dcb (image):

NEW FAILURE - The following job has failed:

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]
pytorchmergebot pushed a commit that referenced this pull request Nov 21, 2024
# Summary
Previous custom op library name was a little verbose and didn't really align with how we typically name our libraries.

Pull Request resolved: #141185
Approved by: https://github.com/Chillee
ghstack dependencies: #141164
pytorchmergebot pushed a commit that referenced this pull request Nov 22, 2024
# Summary
### Before
```Shell
48.71s call     test/inductor/test_flex_attention.py::TestFlexAttention::test_captured_score_mod_aot_eager_gradcheck_score_mod_name__head_offset_mode_aot_eager
```
### After
Speeds up grad check tests by 10x
```Shell
4.74s call     test/inductor/test_flex_attention.py::TestFlexAttention::test_captured_score_mod_aot_eager_gradcheck_score_mod_name__head_offset_mode_aot_eager
```

Pull Request resolved: #141356
Approved by: https://github.com/BoyuanFeng
ghstack dependencies: #141164, #141185
pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
# Summary
We have another IMA for captured buffers when we are the sequences are not divisible.

Running test before this commit:
```Shell
========= Error: process didn't terminate successfully
========= Target application returned an error
========= ERROR SUMMARY: 447 errors
========= ERROR SUMMARY: 347 errors were not printed. Use --print-limit option to adjust the number of printed errors
```

And After
```Shell
❯ CUDA_LAUNCH_BLOCKING=1 PYTORCH_NO_CUDA_MEMORY_CACHING=1 compute-sanitizer --tool memcheck pytest test/inductor/test_flex_attention.py -k "test_non_divisible_with_captured_buffer"
========= COMPUTE-SANITIZER
====================================================== test session starts =======================================================
platform linux -- Python 3.12.7, pytest-7.4.0, pluggy-1.5.0
rootdir: /home/drisspg/meta/pytorch
configfile: pytest.ini
plugins: hypothesis-6.115.5, typeguard-4.3.0
collected 518 items / 517 deselected / 1 selected
Running 1 items in this shard

test/inductor/test_flex_attention.py .                                                                                     [100%]

=============================================== 1 passed, 517 deselected in 13.31s ===============================================
========= ERROR SUMMARY: 0 errors
```

Pull Request resolved: pytorch#141164
Approved by: https://github.com/Chillee
pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
# Summary
Previous custom op library name was a little verbose and didn't really align with how we typically name our libraries.

Pull Request resolved: pytorch#141185
Approved by: https://github.com/Chillee
ghstack dependencies: pytorch#141164
pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
# Summary
### Before
```Shell
48.71s call     test/inductor/test_flex_attention.py::TestFlexAttention::test_captured_score_mod_aot_eager_gradcheck_score_mod_name__head_offset_mode_aot_eager
```
### After
Speeds up grad check tests by 10x
```Shell
4.74s call     test/inductor/test_flex_attention.py::TestFlexAttention::test_captured_score_mod_aot_eager_gradcheck_score_mod_name__head_offset_mode_aot_eager
```

Pull Request resolved: pytorch#141356
Approved by: https://github.com/BoyuanFeng
ghstack dependencies: pytorch#141164, pytorch#141185
@github-actions github-actions bot deleted the gh/drisspg/83/head branch December 22, 2024 02:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants