[FlexAttention] Fix another IMA with captured buffers #141164

drisspg · 2024-11-20T20:45:11Z

Stack from ghstack (oldest at bottom):

Summary

We have another IMA for captured buffers when we are the sequences are not divisible.

Running test before this commit:

========= Error: process didn't terminate successfully
========= Target application returned an error
========= ERROR SUMMARY: 447 errors
========= ERROR SUMMARY: 347 errors were not printed. Use --print-limit option to adjust the number of printed errors

And After

❯ CUDA_LAUNCH_BLOCKING=1 PYTORCH_NO_CUDA_MEMORY_CACHING=1 compute-sanitizer --tool memcheck pytest test/inductor/test_flex_attention.py -k "test_non_divisible_with_captured_buffer"
========= COMPUTE-SANITIZER
====================================================== test session starts =======================================================
platform linux -- Python 3.12.7, pytest-7.4.0, pluggy-1.5.0
rootdir: /home/drisspg/meta/pytorch
configfile: pytest.ini
plugins: hypothesis-6.115.5, typeguard-4.3.0
collected 518 items / 517 deselected / 1 selected                                                                                
Running 1 items in this shard

test/inductor/test_flex_attention.py .                                                                                     [100%]

=============================================== 1 passed, 517 deselected in 13.31s ===============================================
========= ERROR SUMMARY: 0 errors

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov @Chillee @yanboliang @BoyuanFeng

[ghstack-poisoned]

pytorch-bot · 2024-11-20T20:45:15Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/141164

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

[DomainsOnly] Jobs fail with GLIBC version not found

❌ 1 New Failure, 1 Unrelated Failure

As of commit 2180591 with merge base 2ee2dcb ():

NEW FAILURE - The following job has failed:

rocm / linux-focal-rocm6.2-py3.10 / test (default, 2, 6, linux.rocm.gpu.2) (gh)
inductor/test_memory_planning.py::TestMemoryPlanning::test_python_wrapper

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

rocm / linux-focal-rocm6.2-py3.10 / test (default, 6, 6, linux.rocm.gpu.2) (gh) (trunk failure)
inductor/test_padding.py::PaddingTest::test_nobias_LinearAndSoftmax_codegen

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]

# Summary Previous custom op library name was a little verbose and didn't really align with how we typically name our libraries. Pull Request resolved: #141185 Approved by: https://github.com/Chillee ghstack dependencies: #141164

# Summary ### Before ```Shell 48.71s call test/inductor/test_flex_attention.py::TestFlexAttention::test_captured_score_mod_aot_eager_gradcheck_score_mod_name__head_offset_mode_aot_eager ``` ### After Speeds up grad check tests by 10x ```Shell 4.74s call test/inductor/test_flex_attention.py::TestFlexAttention::test_captured_score_mod_aot_eager_gradcheck_score_mod_name__head_offset_mode_aot_eager ``` Pull Request resolved: #141356 Approved by: https://github.com/BoyuanFeng ghstack dependencies: #141164, #141185

# Summary We have another IMA for captured buffers when we are the sequences are not divisible. Running test before this commit: ```Shell ========= Error: process didn't terminate successfully ========= Target application returned an error ========= ERROR SUMMARY: 447 errors ========= ERROR SUMMARY: 347 errors were not printed. Use --print-limit option to adjust the number of printed errors ``` And After ```Shell ❯ CUDA_LAUNCH_BLOCKING=1 PYTORCH_NO_CUDA_MEMORY_CACHING=1 compute-sanitizer --tool memcheck pytest test/inductor/test_flex_attention.py -k "test_non_divisible_with_captured_buffer" ========= COMPUTE-SANITIZER ====================================================== test session starts ======================================================= platform linux -- Python 3.12.7, pytest-7.4.0, pluggy-1.5.0 rootdir: /home/drisspg/meta/pytorch configfile: pytest.ini plugins: hypothesis-6.115.5, typeguard-4.3.0 collected 518 items / 517 deselected / 1 selected Running 1 items in this shard test/inductor/test_flex_attention.py . [100%] =============================================== 1 passed, 517 deselected in 13.31s =============================================== ========= ERROR SUMMARY: 0 errors ``` Pull Request resolved: pytorch#141164 Approved by: https://github.com/Chillee

# Summary Previous custom op library name was a little verbose and didn't really align with how we typically name our libraries. Pull Request resolved: pytorch#141185 Approved by: https://github.com/Chillee ghstack dependencies: pytorch#141164

# Summary ### Before ```Shell 48.71s call test/inductor/test_flex_attention.py::TestFlexAttention::test_captured_score_mod_aot_eager_gradcheck_score_mod_name__head_offset_mode_aot_eager ``` ### After Speeds up grad check tests by 10x ```Shell 4.74s call test/inductor/test_flex_attention.py::TestFlexAttention::test_captured_score_mod_aot_eager_gradcheck_score_mod_name__head_offset_mode_aot_eager ``` Pull Request resolved: pytorch#141356 Approved by: https://github.com/BoyuanFeng ghstack dependencies: pytorch#141164, pytorch#141185

Update

c30684e

[ghstack-poisoned]

drisspg mentioned this pull request Nov 20, 2024

[FlexAttention] add support for learnable biases in Inductor #137452

Closed

pytorch-bot bot added ciflow/inductor module: inductor labels Nov 20, 2024

drisspg added topic: not user facing topic category ciflow/rocm Trigger "default" config CI on ROCm module: flex attention labels Nov 20, 2024

drisspg requested a review from Chillee November 20, 2024 21:08

Update

2180591

[ghstack-poisoned]

drisspg mentioned this pull request Nov 20, 2024

[FlexAttention] Rename zeros_and_scatter library #141185

Closed

drisspg requested a review from yanboliang November 21, 2024 03:14

Chillee approved these changes Nov 21, 2024

View reviewed changes

pytorchmergebot added the Merged label Nov 21, 2024

pytorchmergebot closed this in 073cbf2 Nov 21, 2024

github-actions bot deleted the gh/drisspg/83/head branch December 22, 2024 02:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FlexAttention] Fix another IMA with captured buffers #141164

[FlexAttention] Fix another IMA with captured buffers #141164

Uh oh!

drisspg commented Nov 20, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Nov 20, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[FlexAttention] Fix another IMA with captured buffers #141164

[FlexAttention] Fix another IMA with captured buffers #141164

Uh oh!

Conversation

drisspg commented Nov 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

pytorch-bot bot commented Nov 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/141164

❗ 1 Active SEVs

❌ 1 New Failure, 1 Unrelated Failure

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

drisspg commented Nov 20, 2024 •

edited

Loading

pytorch-bot bot commented Nov 20, 2024 •

edited

Loading