KEMBAR78
[SymmMem] Refactor NVSHMEM tests: separate Triton tests into dedicated file by codingwithsurya · Pull Request #156685 · pytorch/pytorch · GitHub
Skip to content

Conversation

@codingwithsurya
Copy link
Contributor

@codingwithsurya codingwithsurya commented Jun 24, 2025

Summary

Moved the Triton-specific NVSHMEM tests in test_nvshmem.py into a dedicated test_nvshmem_triton.py file. Also put the shared Triton JIT kernels at the top-level of new file for reusability.

Testing

TORCH_SYMMMEM=NVSHMEM python test/distributed/test_nvshmem.py        
TORCH_SYMMMEM=NVSHMEM python test/distributed/test_nvshmem_triton.py 

All 16 original tests pass with no functionality changes.

Stack from ghstack (oldest at bottom):

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k

@pytorch-bot
Copy link

pytorch-bot bot commented Jun 24, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/156685

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ No Failures

As of commit 4ca87e9 with merge base 455dfd2 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@codingwithsurya codingwithsurya self-assigned this Jun 24, 2025
@codingwithsurya codingwithsurya changed the title pulling tests and kernels out of test_nvshmem and putting into seperate file [SymmMem] Refactor NVSHMEM tests: separate Triton tests into dedicated file Jun 24, 2025
@codingwithsurya codingwithsurya added release notes: distributed (c10d) release notes category and removed topic: not user facing topic category labels Jun 24, 2025
@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Jun 24, 2025
@codingwithsurya codingwithsurya force-pushed the gh/codingwithsurya/7/head branch from 3d624e8 to 20b9dcf Compare June 24, 2025 16:45
[ghstack-poisoned]
codingwithsurya added a commit that referenced this pull request Jun 24, 2025
ghstack-source-id: 52c84f0
Pull-Request: #156685
@fduwjj
Copy link
Contributor

fduwjj commented Jun 24, 2025

Can you also update .github/workflows/h100-distributed.yml so the new test is included inside ci as well? Thanks

@codingwithsurya
Copy link
Contributor Author

codingwithsurya commented Jun 24, 2025

Can you also update .github/workflows/h100-distributed.yml so the new test is included inside ci as well? Thanks

sure, just updated it

@codingwithsurya codingwithsurya added release notes: distributed (symm_mem) release note label for symmetric memory and removed topic: not user facing topic category labels Jun 24, 2025
[ghstack-poisoned]
@codingwithsurya codingwithsurya requested a review from a team as a code owner June 25, 2025 18:33
[ghstack-poisoned]
codingwithsurya added a commit that referenced this pull request Jun 25, 2025
ghstack-source-id: 8803f4d
Pull-Request: #156685
@codingwithsurya
Copy link
Contributor Author

removed unnecessary tl.constexpr's in latest commit + updated the ci workflow so test runs

Copy link

@mandroid6 mandroid6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming no changes apart from the triton test splits + CI tests?

@codingwithsurya
Copy link
Contributor Author

Assuming no changes apart from the triton test splits + CI tests?

right - and tl.constexpr changes we were discussing yesterday

[ghstack-poisoned]
codingwithsurya added a commit that referenced this pull request Jun 25, 2025
ghstack-source-id: 11e5f54
Pull-Request: #156685
[ghstack-poisoned]
codingwithsurya added a commit that referenced this pull request Jun 26, 2025
ghstack-source-id: 4a23bde
Pull-Request: #156685
[ghstack-poisoned]
codingwithsurya added a commit that referenced this pull request Jun 26, 2025
ghstack-source-id: 6c892bf
Pull-Request: #156685
@codingwithsurya
Copy link
Contributor Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 26, 2025
@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: Approvers from one of the following sets are needed:

  • superuser (pytorch/metamates)
  • Core Reviewers (mruberry, lezcano, Skylion007, ngimel, peterbell10, ...)
  • Core Maintainers (soumith, gchanan, ezyang, dzhulgakov, malfet, ...)
Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

@kwen2501
Copy link
Contributor

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: Comment with id 3010523110 not found

Details for Dev Infra team Raised by workflow job

@codingwithsurya
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@github-actions github-actions bot deleted the gh/codingwithsurya/7/head branch July 28, 2025 02:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/h100-distributed ciflow/trunk Trigger trunk jobs on your pull request Merged oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (c10d) release notes category release notes: distributed (symm_mem) release note label for symmetric memory

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants