KEMBAR78
[ROCm] Use ieee precision for fp32 in flex attention by jataylo · Pull Request #135702 · pytorch/pytorch · GitHub
Skip to content

Conversation

@jataylo
Copy link
Collaborator

@jataylo jataylo commented Sep 11, 2024

@jataylo jataylo added rocm This tag is for PRs from ROCm team ciflow/rocm Trigger "default" config CI on ROCm ciflow/inductor-rocm Trigger "inductor" config CI on ROCm labels Sep 11, 2024
@jataylo jataylo requested review from Chillee and drisspg September 11, 2024 15:53
@pytorch-bot
Copy link

pytorch-bot bot commented Sep 11, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/135702

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 Cancelled Jobs, 5 Unrelated Failures

As of commit eddebed with merge base 6700175 (image):

CANCELLED JOBS - The following jobs were cancelled. Please retry:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@jataylo jataylo added the topic: not user facing topic category label Sep 11, 2024
@jataylo jataylo marked this pull request as ready for review September 11, 2024 16:22
@jataylo
Copy link
Collaborator Author

jataylo commented Sep 12, 2024

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 12, 2024
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@jataylo
Copy link
Collaborator Author

jataylo commented Sep 12, 2024

Hmm failures are probably not related. I'll rebase and see if they are green

@jataylo
Copy link
Collaborator Author

jataylo commented Sep 12, 2024

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased tf32-precision-rocm-inducto onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout tf32-precision-rocm-inducto && git pull --rebase)

@pytorchmergebot pytorchmergebot force-pushed the tf32-precision-rocm-inducto branch from 748e495 to eddebed Compare September 12, 2024 15:13
@jithunnair-amd
Copy link
Collaborator

@pytorchbot merge -f "Fix ROCm CI failures in inductor/test_flex_encoding.py"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Chao1Han pushed a commit to Chao1Han/pytorch that referenced this pull request Sep 20, 2024
pytorch@3bebc09

Brought in a change to flex_attention to allow TF32 precision, this largely lacks support on ROCm side and we should use ieee.

Pull Request resolved: pytorch#135702
Approved by: https://github.com/jeffdaily, https://github.com/drisspg
jithunnair-amd added a commit that referenced this pull request Sep 24, 2024
@jithunnair-amd
Copy link
Collaborator

The cherry pick PR is at #136557

atalman pushed a commit that referenced this pull request Sep 25, 2024
* [ROCm] skip test_fp8_cast_and_t on non-MI300 machines (#135917)

Fixes #ISSUE_NUMBER

Pull Request resolved: #135917
Approved by: https://github.com/malfet

(cherry picked from commit 6cdc70b)

* Skip pointwise associative scan tests due to regression (changes based on PR #135995)

* Cherry-pick fix from #135702

---------

Co-authored-by: Prachi Gupta <prachi.gupta@amd.com>
Co-authored-by: Jithun Nair <jithun.nair@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/inductor-rocm Trigger "inductor" config CI on ROCm ciflow/rocm Trigger "default" config CI on ROCm ciflow/trunk Trigger trunk jobs on your pull request Merged module: inductor module: rocm AMD GPU support for Pytorch open source rocm This tag is for PRs from ROCm team topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants