NUMA binding integration with elastic agent and torchrun #149334

raghavhrishi · 2025-03-17T17:51:44Z

Implements #148689

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @kwen2501 @c-p-i-o

pytorch-bot · 2025-03-17T17:51:49Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/149334

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 066a805 with merge base ee72338 ():

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

pull / linux-jammy-py3_9-clang9-xla / test (xla, 1, 1, lf.linux.12xlarge, unstable) (gh) (#158876)
sccache: error: couldn't connect to server

This comment was automatically generated by Dr. CI and updates every 15 minutes.

linux-foundation-easycla · 2025-03-17T17:51:49Z

The committers listed above are authorized under a signed CLA.

✅ login: pdesupinski / name: Paul de Supinski (066a805, 518340e, a0cc8fe, 046bbe9, aa59773)
✅ login: raghavhrishi / name: Raghav Hrishikeshan Mukundan (301e6ca)

raghavhrishi · 2025-03-17T17:57:18Z

cc: @ptrblck @eqy @kwen2501 @arpitsardhana

kwen2501 · 2025-03-17T21:13:27Z

Thanks @raghavhrishi ! Can you please sign the CLA?

torch/distributed/numa_binding.py

sanchitintel · 2025-03-19T17:08:16Z

torch/distributed/numa_binding.py

+        return numactlargs
+
+
+class CoreComplex(Numa):


This option seems to be specific to AMD x86_64 processors, which have the concept of a core complex whose cores share L3 cache.
On Intel x86_64 processors, the L3 cache is typically at the granularity of a socket.
L1 & L2 caches are private to each physical core.

Would it be okay to disable this option on Intel x86_64 machines (I'm guessing users would only use this option by mistake on Intel x86_64 machines), or explain the behavior with a warning if it'd be used on an Intel x86_64 machine? @jingxu10, can you please share your opinion?

Thanks!

A warning message can be added when the core-complex option is used and also in the help page (while describing the --numa_binding option) so that users are aware of it.

This has been updated in the recent commit.

Shall we consider the E P core case?
https://www.intel.com/content/www/us/en/gaming/resources/how-hybrid-design-works.html

Shall we consider the E P core case?

Thanks for your inputs, @jingxu10!
Looks like some variants of new data-center grade Xeon processors may also have E cores as well, so we should also probably consider them.

@leslie-fang-intel, please share your inputs. Thanks!

Thanks for sharing your thoughts – it's a good idea. It could potentially be a follow-up Pull Request once we’ve had the time to consider the design and how best to integrate it.
cc: @arpitsardhana

ashesh2512 · 2025-04-09T21:23:50Z

torch/distributed/numa_binding.py

+        resultCpuList = []
+        for i in range(resultCpuLen):
+            if (cpusSharedCacheVal >> i) & 1 == 1:
+                resultCpuList.append(i)


@raghavhrishi First, great job pushing on this feature!

Referencing your example in #148689, for the exclusive binding option,

If Rank 0 and Rank 1 are both affined to NUMA Node 0, the cores would be split as follows: Rank 0: numactl --physcpubind=0-3 --membind=0 Rank 1: numactl --physcpubind=4-7 --membind=0

This assumes that a contiguous indexing of CPUs would result in the most optimal binding. Could you please confirm is this is indeed an assumption in this PR? In my experience, there are many node architectures where linear indexing of CPUs is not the norm, see Frontier e.g., - https://docs.olcf.ornl.gov/systems/frontier_user_guide.html#frontier-compute-nodes

If linear indexing of CPUs is indeed assumed, would it be possible to have a user option to specify --physcpubind or pass in the CPU/GPU topology?

@ashesh2512 Thanks for your comment!

Non-linear core indexing might have edge cases in scenarios where there are only two NUMA Nodes available for binding, and multiple ranks (e.g., 4) are affined to the same NUMA Node. In such cases, linear indexing might be necessary to address the issue effectively.

The exclusive binding strategy utilizes topology information to determine the NUMA Node associated with each rank. Once identified, it ensures that ranks affined to the same NUMA Node are assigned distinct sets of cores using physcpubind, preventing overlap. This approach ensures that ranks sharing affinity with a NUMA Node do not use the same cores. The strategy uses the system's underlying topology information and avoids cross-NUMA binding.

As a potential enhancement, we could consider adding an option for users to specify the cores they wish to use in a follow-up pull request after reviewing the design.

cc: @arpitsardhana

@raghavhrishi Thanks, I think that an option for users to specify the cores they wish to use would be ideal in a follow up PR. I could help with that.

For context, one of the architectures I work with, a single compute node (8 GPUs per node) has the following CPU/GPU affinity. Ideally, the user would be able to bind a process to one or multiple cores, and set the GPU index in PyTorch accordingly.

NUMA 0: hardware threads 000-007, 064-071 | GPU 4 hardware threads 008-015, 072-079 | GPU 5 NUMA 1: hardware threads 016-023, 080-087 | GPU 2 hardware threads 024-031, 088-095 | GPU 3 NUMA 2: hardware threads 032-039, 096-103 | GPU 6 hardware threads 040-047, 104-111 | GPU 7 NUMA 3: hardware threads 048-055, 112-119 | GPU 0 hardware threads 056-063, 120-127 | GPU 1

kwen2501 · 2025-04-28T22:44:03Z

requirements.txt

 psutil
 pyyaml
 requests
+pynvml


@atalman @malfet This seems to introduce a dependency. wdyt?

Gentle ping @malfet @atalman

Nope, we deliberately decided not to depend on pynvml, as one can very easily rewrite everything one need with ctypes

Moreover, it's a big no-go for something like ROCM or XPU

kwen2501 · 2025-04-28T22:47:16Z

torch/distributed/numa_binding.py

+    def get_gpu_count(self):
+        # Initialize NVML
+        pynvml.nvmlInit()
+        # Get the number of GPU devices
+        device_count = pynvml.nvmlDeviceGetCount()
+        # Shutdown NVML
+        pynvml.nvmlShutdown()


Is there a device-generic way?
There should be some methods in torch.accelerator package now.

what do you think of this?

kwen2501 · 2025-04-28T22:50:09Z

torch/distributed/numa_binding.py

+    # returns array indexed by GPU id and mapping to value NUMA node id
+    def get_numa_nodes(self):


nit: would appreciate an example of the return.

Suppose we have 4 GPUs, and they are connected to the following NUMA nodes:

GPU 0 → NUMA Node 0

GPU 1 → NUMA Node 0

GPU 2 → NUMA Node 1

GPU 3 → NUMA Node 1

Then the function would return:

[0, 0, 1, 1]

kwen2501 · 2025-04-28T22:51:50Z

torch/distributed/numa_binding.py

+        for busID in pciBusIDs:
+            pciFields = busID.split(":")
+            pciDir = f"{pciFields[0][-4:]}:{pciFields[1]}:{pciFields[2]}"
+            numaFile = NUMA_CMD.format(value=pciDir.lower())
+            try:
+                with open(numaFile) as numa_node_text:
+                    node = int(numa_node_text.read())
+                    numaNodes.append(node)
+            except FileNotFoundError:
+                print(f"The file {numaFile} does not exist.")


nit: can you comment on this block?
Also, is it worth for NVML to add an API to return the needed value?

For each GPU's PCI bus ID, constructs the sysfs path to its NUMA node file & reads the NUMA node associated with it. The function returns a list of NUMA nodes associated with each GPU.

kwen2501 · 2025-04-28T22:54:01Z

torch/distributed/numa_binding.py

+    # returns a bitmap for each core, its sibling cores
+    def get_thread_siblings(self, cpu):


What is the function for?

get_thread_siblings identifies which other CPUs (cores) are on the same NUMA node as the current CPU.

stas00 · 2025-04-29T19:39:44Z

Super!!! Thank you for implementing this, @raghavhrishi

This issue could be closed as well when merged: #115305

pdesupinski · 2025-06-11T21:07:25Z

Excited for this @raghavhrishi. Any update on the timeline?

raghavhrishi · 2025-06-13T03:51:33Z

@kwen2501: I've addressed the comments in the PR. Please let me know if there's anything else needed to proceed with the merge.
cc: @arpitsardhana

kwen2501 · 2025-06-16T21:26:51Z

Thanks for the improvements.

In general, I am wondering if there is a way to do it in a device-agnostic way. But I understand torch.accelerator APIs are not there yet (like, calculating the distance between a GPU and a CPU). cc @albanD. So perhaps it may be okay as is in the PR for now.

If we'd like to avoid a direct dependency on pynvml (as you did to requirements.txt), can we put a check in torchrun to see if pynvml is available? If available we use the code here; if not, let's fall back (doesn't hurt?)
cc @malfet @atalman

I will defer to @kiukchung and @d4l3k for final decision.

d4l3k · 2025-06-16T21:32:10Z

torch/distributed/run.py

+    numa_cmd = None
+    py_executable = os.getenv("PYTHON_EXEC", sys.executable)
+    if args.numa_binding:
+        numa_cmd = update_with_numa_binding_pytorch(args.numa_binding)


Can we implement this at the elastic agent level? Putting this logic here means only CLI users can get numa control and not via the programmatic API

d4l3k · 2025-06-23T21:04:01Z

@raghavhrishi do you have bandwidth to update this PR? There's still some of refactoring required to get this into a good state

The two main things are:

make nvml a soft dependency
refactor the integration in torchelastic to operate at the agent level (where we launch subprocesses/multiprocessing) rather than in the arg parsing/wrapper script

Primarily asking since we'd like to land this support and have someone who might be interested in pushing this over the line

We could also land this in pieces -- i.e. land the helper utilities and then follow up with a cleaner torchelastic integration

albanD · 2025-06-23T21:27:31Z

torch/distributed/run.py

        "Can be used to override custom logging behavior.",
    )
-
+    parser.add_argument(


cc @EikanWang didn't someone from your end already send a PR to do NUMA binding in torchrun? That vaguely rings a bell to me.

Hi @albanD , do you mean the script https://github.com/pytorch/pytorch/blob/main/torch/backends/xeon/run_cpu.py or #133835 or a separate recent PR?
Discussed with Nikita before, code changes in 133835 involves too many things, I'll split it into smaller PRs later.

Ho yes, https://github.com/pytorch/pytorch/blob/main/torch/backends/xeon/run_cpu.py is what I had in mind. @raghavhrishi any link between the two?

@albanD I think this file that you have mentioned is different from this MR's implementation.

albanD · 2025-06-23T21:28:45Z

setup.py

        "networkx",
        "jinja2",
        "fsspec",
+        "pynvml>=11.4.1",


Adding such a hard dependency is definitely not ok without much deeper considerations.

facebook-github-bot · 2025-07-24T22:35:03Z

@pdesupinski has imported this pull request. If you are a Meta employee, you can view this in D78319234.

facebook-github-bot · 2025-07-25T03:24:06Z

@pdesupinski has imported this pull request. If you are a Meta employee, you can view this in D78319234.

facebook-github-bot · 2025-07-25T21:08:45Z

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

pytorchmergebot · 2025-07-25T21:14:07Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Implements #148689 Pull Request resolved: #149334 Approved by: https://github.com/d4l3k Co-authored-by: Paul de Supinski <pdesupinski@gmail.com>

# Context This is an extension of #149334. # This PR Add support for NUMA bindings with Callable entrypoints, such as `do_train` instead of `/usr/local/bin/python`. Most notably, we utilize a hack in order to force `Process.start()` to use custom NUMA bindings for each subprocess. Please search for `HACK:` in the code to see a description of the implementation we chose, and #160006 for discussion of alternatives and why this is necessary. Other changes: * Remove unnecessary `--preferred` option from all binding strategies. By default, Linux already allocates memory to the NUMA node local to the CPU which triggered the allocation. (See [MPOL_LOCAL](https://man7.org/linux/man-pages/man2/set_mempolicy.2.html).) * Refactor so that the main API is `maybe_wrap_command_with_numa_bindings`, which computes bindings for a single rank at a time, rather than `maybe_wrap_with_numa_bindings` which computed bindings for all ranks at once. This allowed for more code sharing between `Callable` and `str` entrypoints. # Test Plan ## Automated `$ pytest test/test_numa_binding.py` ## Manual Using [this benchmark,](https://gist.github.com/pdesupinski/bbe01ade455d86e989794f2c612e2d91), ran ``` $ PYTHONUNBUFFERED=1 LOGLEVEL=INFO perf stat -e ls_dmnd_fills_from_sys.dram_io_far,ls_dmnd_fills_from_sys.dram_io_near -- python -m torch.distributed.run --standalone --nproc-per-node=8 --numa-binding=node --run-path mlp_train.py 2>&1 | tee node_callable.txt && PYTHONUNBUFFERED=1 LOGLEVEL=INFO perf stat -e ls_dmnd_fills_from_sys.dram_io_far,ls_dmnd_fills_from_sys.dram_io_near -- python -u -m torch.distributed.run --standalone --nproc-per-node=8 --run-path mlp_train.py 2>&1 | tee none_callable.txt ``` and observed * 6.6% remote memory accesses with 'node' bindings * 11.6% remote without bindings I also ran similar with `str` entrypoints as before just to be sure it's still working. NOTE: [--run-path triggers the code to be run inside a `Callable`.](https://github.com/pytorch/pytorch/blob/017259f9c65b6fad55fb9597d7077e2543eaae46/torch/distributed/run.py#L870) Pull Request resolved: #160163 Approved by: https://github.com/d4l3k

…60848) # Context Another fix to enable broad rollout of #149334. The implementation assumes that the trainer process with local rank `n` only uses device `cuda:n`. However, there are sometimes jobs with more than one GPU per process, in which case our assumption could be incorrect and actually lead to worse memory locality. # This PR As titled. Pull Request resolved: #160848 Approved by: https://github.com/kiukchung

# Context This is an extension of pytorch#149334. # This PR Add support for NUMA bindings with Callable entrypoints, such as `do_train` instead of `/usr/local/bin/python`. Most notably, we utilize a hack in order to force `Process.start()` to use custom NUMA bindings for each subprocess. Please search for `HACK:` in the code to see a description of the implementation we chose, and pytorch#160006 for discussion of alternatives and why this is necessary. Other changes: * Remove unnecessary `--preferred` option from all binding strategies. By default, Linux already allocates memory to the NUMA node local to the CPU which triggered the allocation. (See [MPOL_LOCAL](https://man7.org/linux/man-pages/man2/set_mempolicy.2.html).) * Refactor so that the main API is `maybe_wrap_command_with_numa_bindings`, which computes bindings for a single rank at a time, rather than `maybe_wrap_with_numa_bindings` which computed bindings for all ranks at once. This allowed for more code sharing between `Callable` and `str` entrypoints. # Test Plan ## Automated `$ pytest test/test_numa_binding.py` ## Manual Using [this benchmark,](https://gist.github.com/pdesupinski/bbe01ade455d86e989794f2c612e2d91), ran ``` $ PYTHONUNBUFFERED=1 LOGLEVEL=INFO perf stat -e ls_dmnd_fills_from_sys.dram_io_far,ls_dmnd_fills_from_sys.dram_io_near -- python -m torch.distributed.run --standalone --nproc-per-node=8 --numa-binding=node --run-path mlp_train.py 2>&1 | tee node_callable.txt && PYTHONUNBUFFERED=1 LOGLEVEL=INFO perf stat -e ls_dmnd_fills_from_sys.dram_io_far,ls_dmnd_fills_from_sys.dram_io_near -- python -u -m torch.distributed.run --standalone --nproc-per-node=8 --run-path mlp_train.py 2>&1 | tee none_callable.txt ``` and observed * 6.6% remote memory accesses with 'node' bindings * 11.6% remote without bindings I also ran similar with `str` entrypoints as before just to be sure it's still working. NOTE: [--run-path triggers the code to be run inside a `Callable`.](https://github.com/pytorch/pytorch/blob/017259f9c65b6fad55fb9597d7077e2543eaae46/torch/distributed/run.py#L870) Pull Request resolved: pytorch#160163 Approved by: https://github.com/d4l3k

…torch#160848) # Context Another fix to enable broad rollout of pytorch#149334. The implementation assumes that the trainer process with local rank `n` only uses device `cuda:n`. However, there are sometimes jobs with more than one GPU per process, in which case our assumption could be incorrect and actually lead to worse memory locality. # This PR As titled. Pull Request resolved: pytorch#160848 Approved by: https://github.com/kiukchung

# Context This is an extension of pytorch#149334. # This PR Add support for NUMA bindings with Callable entrypoints, such as `do_train` instead of `/usr/local/bin/python`. Most notably, we utilize a hack in order to force `Process.start()` to use custom NUMA bindings for each subprocess. Please search for `HACK:` in the code to see a description of the implementation we chose, and pytorch#160006 for discussion of alternatives and why this is necessary. Other changes: * Remove unnecessary `--preferred` option from all binding strategies. By default, Linux already allocates memory to the NUMA node local to the CPU which triggered the allocation. (See [MPOL_LOCAL](https://man7.org/linux/man-pages/man2/set_mempolicy.2.html).) * Refactor so that the main API is `maybe_wrap_command_with_numa_bindings`, which computes bindings for a single rank at a time, rather than `maybe_wrap_with_numa_bindings` which computed bindings for all ranks at once. This allowed for more code sharing between `Callable` and `str` entrypoints. # Test Plan ## Automated `$ pytest test/test_numa_binding.py` ## Manual Using [this benchmark,](https://gist.github.com/pdesupinski/bbe01ade455d86e989794f2c612e2d91), ran ``` $ PYTHONUNBUFFERED=1 LOGLEVEL=INFO perf stat -e ls_dmnd_fills_from_sys.dram_io_far,ls_dmnd_fills_from_sys.dram_io_near -- python -m torch.distributed.run --standalone --nproc-per-node=8 --numa-binding=node --run-path mlp_train.py 2>&1 | tee node_callable.txt && PYTHONUNBUFFERED=1 LOGLEVEL=INFO perf stat -e ls_dmnd_fills_from_sys.dram_io_far,ls_dmnd_fills_from_sys.dram_io_near -- python -u -m torch.distributed.run --standalone --nproc-per-node=8 --run-path mlp_train.py 2>&1 | tee none_callable.txt ``` and observed * 6.6% remote memory accesses with 'node' bindings * 11.6% remote without bindings I also ran similar with `str` entrypoints as before just to be sure it's still working. NOTE: [--run-path triggers the code to be run inside a `Callable`.](https://github.com/pytorch/pytorch/blob/017259f9c65b6fad55fb9597d7077e2543eaae46/torch/distributed/run.py#L870) Pull Request resolved: pytorch#160163 Approved by: https://github.com/d4l3k

…torch#160848) # Context Another fix to enable broad rollout of pytorch#149334. The implementation assumes that the trainer process with local rank `n` only uses device `cuda:n`. However, there are sometimes jobs with more than one GPU per process, in which case our assumption could be incorrect and actually lead to worse memory locality. # This PR As titled. Pull Request resolved: pytorch#160848 Approved by: https://github.com/kiukchung

pytorch-bot bot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Mar 17, 2025

pytorchbot added the open source label Mar 17, 2025

kwen2501 added the release notes: distributed (c10d) release notes category label Mar 17, 2025

kwen2501 requested review from LucasLLC, albanD, d4l3k and kwen2501 March 17, 2025 22:22

sanchitintel requested a review from jingxu10 March 18, 2025 00:57

sanchitintel reviewed Mar 19, 2025

View reviewed changes

torch/distributed/numa_binding.py Outdated Show resolved Hide resolved

sanchitintel reviewed Mar 19, 2025

View reviewed changes

janeyx99 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Mar 20, 2025

ashesh2512 reviewed Apr 9, 2025

View reviewed changes

raghavhrishi requested a review from jeffdaily as a code owner April 26, 2025 04:50

kwen2501 requested a review from kiukchung April 28, 2025 22:38

kwen2501 reviewed Apr 28, 2025

View reviewed changes

d4l3k requested a review from ngimel April 29, 2025 17:48

kwen2501 mentioned this pull request Apr 29, 2025

[training] Adding NUMA support for pytorch #150597

Closed

raghavhrishi requested a review from kwen2501 June 14, 2025 05:28

d4l3k reviewed Jun 16, 2025

View reviewed changes

albanD previously requested changes Jun 23, 2025

View reviewed changes

albanD reviewed Jun 23, 2025

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jul 24, 2025

pdesupinski force-pushed the raghavhrishi/numa-binding-torchrun branch from dc8b4f6 to b8ba819 Compare July 24, 2025 19:42

pytorch-bot bot removed the ciflow/trunk Trigger trunk jobs on your pull request label Jul 24, 2025

pdesupinski mentioned this pull request Jul 24, 2025

UNSTABLE pull / linux-jammy-py3_9-clang9-xla / test (xla) #158876

Open

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jul 24, 2025

Fix nits and reference NumaOptions directly in _utils_internal.py

066a805

pdesupinski force-pushed the raghavhrishi/numa-binding-torchrun branch from b8ba819 to 066a805 Compare July 25, 2025 03:10

pytorch-bot bot removed the ciflow/trunk Trigger trunk jobs on your pull request label Jul 25, 2025

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jul 25, 2025

pytorchmergebot added the merging label Jul 25, 2025

pytorchmergebot added the Merged label Jul 25, 2025

pytorchmergebot closed this in 7ef3c33 Jul 25, 2025

pytorchmergebot removed the merging label Jul 25, 2025

yangw-dev pushed a commit that referenced this pull request Aug 1, 2025

NUMA binding integration with elastic agent and torchrun (#149334)

82c74e4

Implements #148689 Pull Request resolved: #149334 Approved by: https://github.com/d4l3k Co-authored-by: Paul de Supinski <pdesupinski@gmail.com>

This was referenced Aug 6, 2025

Support NUMA Binding for Callable entrypoints to elastic_launch #160006

Open

Support NUMA Binding for Callable Entrypoints #160163

Closed

pdesupinski mentioned this pull request Aug 18, 2025

[ez] Only use default numa bindings if nproc == cuda device count #160848

Closed

		# returns array indexed by GPU id and mapping to value NUMA node id
		def get_numa_nodes(self):

		# returns a bitmap for each core, its sibling cores
		def get_thread_siblings(self, cpu):

NUMA binding integration with elastic agent and torchrun #149334

NUMA binding integration with elastic agent and torchrun #149334

Uh oh!

Conversation

raghavhrishi commented Mar 17, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Mar 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/149334

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

linux-foundation-easycla bot commented Mar 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

raghavhrishi commented Mar 17, 2025

Uh oh!

kwen2501 commented Mar 17, 2025

Uh oh!

Uh oh!

sanchitintel Mar 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

raghavhrishi Mar 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sanchitintel Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ashesh2512 Apr 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

raghavhrishi Apr 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ashesh2512 Apr 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

raghavhrishi May 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stas00 commented Apr 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pdesupinski commented Jun 11, 2025

Uh oh!

raghavhrishi commented Mar 17, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Mar 17, 2025 •

edited

Loading

linux-foundation-easycla bot commented Mar 17, 2025 •

edited

Loading

sanchitintel Mar 19, 2025 •

edited

Loading

raghavhrishi Mar 21, 2025 •

edited

Loading

sanchitintel Apr 3, 2025 •

edited

Loading

ashesh2512 Apr 9, 2025 •

edited

Loading

raghavhrishi Apr 11, 2025 •

edited

Loading

ashesh2512 Apr 11, 2025 •

edited

Loading

raghavhrishi May 18, 2025 •

edited

Loading

stas00 commented Apr 29, 2025 •

edited

Loading

kwen2501 commented Jun 16, 2025 •

edited

Loading

jingxu10 Jun 23, 2025 •

edited

Loading