Fix the parity of original and exported module parameters #160600

supercharleszhu · 2025-08-14T01:03:33Z

Problem

Fixing parameter mismatch issue during torch.export with strict mode (see "How to reproduce the issue" section below):

When there are two attribute mapping to the same tensor, the strict mode will

Have a standard param buffer table to standardize the name (bug happens here! when 2 parameter have same id(param), the latter name will overwrite the previous name)
Update exported signature with updated standard FQN (problematic)
When getting exported_program.module(), it will call _unlift_exported_program_lifted_states to recover attribute from exported signature where the parameter name is defined and standardized
Then the named_parameter of this module will have overwritten name instead of original name

How to reproduce the issue?

reproduce issue shared by @taotaohuang001

torch version: 2.8.0

import torch
from torch import nn

# ---- Toy model with embedding weight sharing (aliasing) ----
class Toy(nn.Module):
    def __init__(self):
        super().__init__()
        self.embedding_layers = nn.ModuleDict()
        tbl = nn.Embedding(100, 8)
        self.embedding_layers["ActorId"] = tbl
        # Alias: reuse the SAME module instance for another feature
        self.embedding_layers["RootActorId"] = self.embedding_layers["ActorId"]
        self.proj = nn.Linear(16, 1)

    def forward(self, feats: dict[str, torch.Tensor]):
        e1 = self.embedding_layers["ActorId"](feats["ActorId"])
        e2 = self.embedding_layers["RootActorId"](feats["RootActorId"])
        return self.proj(torch.cat([e1, e2], dim=-1))

torch.manual_seed(0)

m = Toy().eval()

# Show pre-export parameter names (canonicalized; shared weight appears once)
print("PRE-EXPORT named_parameters:")
print([name for name, _ in m.named_parameters()])

# Sanity: the two feature names point to the same weight object
w1 = m.embedding_layers["ActorId"].weight
w2 = m.embedding_layers["RootActorId"].weight
print("PRE-EXPORT alias -> same object:", w1 is w2, "| same storage:", w1.data_ptr() == w2.data_ptr())

# Example inputs (dict structure will be captured by export)
ex_in = {
    "ActorId":     torch.randint(0, 100, (4,)),
    "RootActorId": torch.randint(0, 100, (4,)),
}

# ---- Export (in memory) and materialize the runnable module ----
ep = torch.export.export(m, (ex_in,), strict=True)
gm = ep.module()  # GraphModule with new (canonical) parameter names

print("\nPOST-EXPORT named_parameters (GraphModule):")
post_names = [name for name, _ in gm.named_parameters()]
print(post_names)

# Prove alias persists after export: run fwd/bwd and check a single grad tensor exists
out = gm(ex_in).sum()
out.backward()

# Find the embedding weight in the exported module by shape (100, 8)
emb_names = [name for name, p in gm.named_parameters() if p.shape == torch.Size([100, 8])]
print("\nEmbedding param (post-export) canonical name:", emb_names[0] if emb_names else "<not found>")

# Show that only one grad exists for the shared table
for name, p in gm.named_parameters():
    if p.grad is not None and p.shape == torch.Size([100, 8]):
        print("Grad present on shared embedding weight:", name, "| grad shape:", tuple(p.grad.shape))
        break

And you will see parameters are different before and after export

PRE-EXPORT named_parameters:
['embedding_layers.ActorId.weight', 'proj.weight', 'proj.bias']
PRE-EXPORT alias -> same object: True | same storage: True

POST-EXPORT named_parameters (GraphModule):
['embedding_layers.RootActorId.weight', 'proj.weight', 'proj.bias']

Embedding param (post-export) canonical name: embedding_layers.RootActorId.weight
Grad present on shared embedding weight: embedding_layers.RootActorId.weight | grad shape: (100, 8)

Solution

Fixing this issue by making sure latter named parameter will not overwrite the param_buffer_table when original model's named parameter already maps to certain parameter.

… attributes point to the same parameter.

pytorch-bot · 2025-08-14T01:03:36Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/160600

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit e1a841b with merge base 83283ce ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

angelayi

thanks so much for the fix! could you add a test to check for this behavior?

facebook-github-bot · 2025-08-21T22:37:10Z

@angelayi has imported this pull request. If you are a Meta employee, you can view this in D80750328.

facebook-github-bot · 2025-08-22T18:31:31Z

@angelayi has imported this pull request. If you are a Meta employee, you can view this in D80750328.

angelayi

thanks so much!

supercharleszhu · 2025-08-22T18:34:13Z

@pytorchbot merge

pytorchmergebot · 2025-08-22T18:36:13Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-08-22T18:41:40Z

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Lint / lintrunner-noclang / linux-job

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

supercharleszhu · 2025-08-22T19:38:23Z

@pytorchbot merge

pytorch-bot · 2025-08-22T19:38:28Z

Pull workflow has not been scheduled for the PR yet. It could be because author doesn't have permissions to run those or skip-checks keywords were added to PR/commits, aborting merge. Please get/give approval for the workflows and/or remove skip ci decorators before next merge attempt. If you think this is a mistake, please contact PyTorch Dev Infra.

facebook-github-bot · 2025-08-22T20:03:54Z

@angelayi has imported this pull request. If you are a Meta employee, you can view this in D80750328.

facebook-github-bot · 2025-08-22T22:28:34Z

@angelayi has imported this pull request. If you are a Meta employee, you can view this in D80750328.

supercharleszhu · 2025-08-25T04:07:32Z

@pytorchbot merge

pytorchmergebot · 2025-08-25T04:09:22Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-08-25T04:09:36Z

Merge failed

Reason: 26 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

facebook-github-bot · 2025-08-25T16:13:49Z

@angelayi has imported this pull request. If you are a Meta employee, you can view this in D80750328.

supercharleszhu · 2025-08-25T19:21:58Z

@pytorchbot merge

pytorchmergebot · 2025-08-25T19:23:56Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

@taotaohuang001

…0600) ## Problem Fixing parameter mismatch issue during torch.export with strict mode (see "How to reproduce the issue" section below): When there are two attribute mapping to the same tensor, the strict mode will 1. Have a standard param buffer table to standardize the name (bug happens [here](https://github.com/supercharleszhu/pytorch/blob/f861dc1826f7b49de37a5578d6e9ef6300498606/torch/export/_trace.py#L356)! when 2 parameter have same id(param), the latter name will overwrite the previous name) 2. [Update](https://github.com/supercharleszhu/pytorch/blob/f861dc1826f7b49de37a5578d6e9ef6300498606/torch/export/_trace.py#L1481) exported signature with updated standard FQN (problematic) 3. When getting exported_program.module(), it will call [_unlift_exported_program_lifted_states](https://github.com/supercharleszhu/pytorch/blob/f861dc1826f7b49de37a5578d6e9ef6300498606/torch/export/exported_program.py#L1297) to recover attribute from exported signature where the parameter name is defined and standardized Then the named_parameter of this module will have overwritten name instead of original name ## How to reproduce the issue? reproduce issue shared by @taotaohuang001 torch version: 2.8.0 ```python import torch from torch import nn # ---- Toy model with embedding weight sharing (aliasing) ---- class Toy(nn.Module): def __init__(self): super().__init__() self.embedding_layers = nn.ModuleDict() tbl = nn.Embedding(100, 8) self.embedding_layers["ActorId"] = tbl # Alias: reuse the SAME module instance for another feature self.embedding_layers["RootActorId"] = self.embedding_layers["ActorId"] self.proj = nn.Linear(16, 1) def forward(self, feats: dict[str, torch.Tensor]): e1 = self.embedding_layers["ActorId"](feats["ActorId"]) e2 = self.embedding_layers["RootActorId"](feats["RootActorId"]) return self.proj(torch.cat([e1, e2], dim=-1)) torch.manual_seed(0) m = Toy().eval() # Show pre-export parameter names (canonicalized; shared weight appears once) print("PRE-EXPORT named_parameters:") print([name for name, _ in m.named_parameters()]) # Sanity: the two feature names point to the same weight object w1 = m.embedding_layers["ActorId"].weight w2 = m.embedding_layers["RootActorId"].weight print("PRE-EXPORT alias -> same object:", w1 is w2, "| same storage:", w1.data_ptr() == w2.data_ptr()) # Example inputs (dict structure will be captured by export) ex_in = { "ActorId": torch.randint(0, 100, (4,)), "RootActorId": torch.randint(0, 100, (4,)), } # ---- Export (in memory) and materialize the runnable module ---- ep = torch.export.export(m, (ex_in,), strict=True) gm = ep.module() # GraphModule with new (canonical) parameter names print("\nPOST-EXPORT named_parameters (GraphModule):") post_names = [name for name, _ in gm.named_parameters()] print(post_names) # Prove alias persists after export: run fwd/bwd and check a single grad tensor exists out = gm(ex_in).sum() out.backward() # Find the embedding weight in the exported module by shape (100, 8) emb_names = [name for name, p in gm.named_parameters() if p.shape == torch.Size([100, 8])] print("\nEmbedding param (post-export) canonical name:", emb_names[0] if emb_names else "<not found>") # Show that only one grad exists for the shared table for name, p in gm.named_parameters(): if p.grad is not None and p.shape == torch.Size([100, 8]): print("Grad present on shared embedding weight:", name, "| grad shape:", tuple(p.grad.shape)) break ``` And you will see parameters are different before and after export ``` PRE-EXPORT named_parameters: ['embedding_layers.ActorId.weight', 'proj.weight', 'proj.bias'] PRE-EXPORT alias -> same object: True | same storage: True POST-EXPORT named_parameters (GraphModule): ['embedding_layers.RootActorId.weight', 'proj.weight', 'proj.bias'] Embedding param (post-export) canonical name: embedding_layers.RootActorId.weight Grad present on shared embedding weight: embedding_layers.RootActorId.weight | grad shape: (100, 8) ``` ## Solution Fixing this issue by making sure latter named parameter will not overwrite the `param_buffer_table` when original model's named parameter already maps to certain parameter. Pull Request resolved: pytorch#160600 Approved by: https://github.com/angelayi

Fix the parity of original and traced module parameters when multiple…

d9a9ee9

… attributes point to the same parameter.

supercharleszhu requested review from angelayi, avikchaudhuri, tugsbayasgalan, ydwu4 and zhxchen17 as code owners August 14, 2025 01:03

pytorch-bot bot added the release notes: export label Aug 14, 2025

pytorchbot added the open source label Aug 14, 2025

cpuhrsch added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Aug 14, 2025

angelayi reviewed Aug 21, 2025

View reviewed changes

pytorch-bot bot added ciflow/trunk Trigger trunk jobs on your pull request and removed ciflow/trunk Trigger trunk jobs on your pull request labels Aug 21, 2025

test strict export with shared parameters

20b26dd

supercharleszhu force-pushed the fix-name-parity branch from 834da9a to 20b26dd Compare August 22, 2025 18:27

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 22, 2025

angelayi approved these changes Aug 22, 2025

View reviewed changes

pytorchmergebot added the merging label Aug 22, 2025

pytorchmergebot removed the merging label Aug 22, 2025

fix lint error

409123d

pytorch-bot bot removed the ciflow/trunk Trigger trunk jobs on your pull request label Aug 22, 2025

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 22, 2025

run lintrunner -a to fix lint error

b2ef9a4

pytorch-bot bot removed the ciflow/trunk Trigger trunk jobs on your pull request label Aug 22, 2025

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 22, 2025

pytorchmergebot added the merging label Aug 25, 2025

pytorchmergebot removed the merging label Aug 25, 2025

Merge branch 'main' into fix-name-parity

e1a841b

pytorch-bot bot removed the ciflow/trunk Trigger trunk jobs on your pull request label Aug 25, 2025

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 25, 2025

pytorchmergebot added the merging label Aug 25, 2025

pytorchmergebot closed this in ffa1ce7 Aug 25, 2025

pytorchmergebot added Merged and removed merging labels Aug 25, 2025

Fix the parity of original and exported module parameters #160600

Fix the parity of original and exported module parameters #160600

Uh oh!

Conversation

supercharleszhu commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

How to reproduce the issue?

Solution

Uh oh!

pytorch-bot bot commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/160600

✅ No Failures

Uh oh!

angelayi left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Aug 21, 2025

Uh oh!

facebook-github-bot commented Aug 22, 2025

Uh oh!

angelayi left a comment

Choose a reason for hiding this comment

Uh oh!

supercharleszhu commented Aug 22, 2025

Uh oh!

pytorchmergebot commented Aug 22, 2025

Merge started

Uh oh!

pytorchmergebot commented Aug 22, 2025

Merge failed

Uh oh!

supercharleszhu commented Aug 22, 2025

Uh oh!

pytorch-bot bot commented Aug 22, 2025

Uh oh!

facebook-github-bot commented Aug 22, 2025

Uh oh!

facebook-github-bot commented Aug 22, 2025

Uh oh!

supercharleszhu commented Aug 25, 2025

Uh oh!

pytorchmergebot commented Aug 25, 2025

Merge started

Uh oh!

pytorchmergebot commented Aug 25, 2025

Merge failed

Uh oh!

facebook-github-bot commented Aug 25, 2025

Uh oh!

supercharleszhu commented Aug 25, 2025

Uh oh!

pytorchmergebot commented Aug 25, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

supercharleszhu commented Aug 14, 2025 •

edited

Loading

pytorch-bot bot commented Aug 14, 2025 •

edited

Loading