Use source hashing to generate consistent symbolic ids #149665

bobrenjc93 · 2025-03-20T20:57:06Z

Stack from ghstack (oldest at bottom):

-> Use source hashing to generate consistent symbolic ids #149665

This PR was inspired by internal models that were cache missing due to PGO. At a high level the problem looks as follows

Run 1, Invocation 1: We do static compile, save some example values in PGO/automatic dynamic

Run 1, Invocation 2: We detect varying inputs, do dynamic compile, get a dynamic graph and save to PGO. Crucially what we save to PGO is actually a superset of what is actually dynamic. If we notice an input was varying, we mark it as dynamic in PGO even if later on that value gets specialized. When a value gets specialized, we actually remove the symbol from the graph. This results in an interesting conundrum where although we are producing the same isomorphic graph, PGO makes the second run cache miss. Let's see how....

Run 2, Invocation 1: We fetch the PGO, over-mark things as dynamic, get a fx graph, look it up in the cache and... whoops! cache miss! This is because of the aforementioned behavior where the PGO profile will cause us to over-allocate symbols. In practice this means we end up saving a graph in cache with symbols x:s1, y:s3 and on second attempt we cache miss with x:s1, y:s6 where symbols s3,s4,s5 were all optimistically marked dynamic by PGO and subsequently specialized.

We solve this problem by hashing the source names. This ensures somewhat stable assignment. To prevent catastrophic symbol collisions, we use linear probing to ensure no collisions.

cc @ezyang @SherlockNoMad @EikanWang @jgong5 @wenzhe-nrv @voznesenskym @penguinwu @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov

[ghstack-poisoned]

pytorch-bot · 2025-03-20T20:57:09Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/149665

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

⏳ 1 Pending, 5 Unrelated Failures

As of commit ae47187 with merge base 8b04364 ():

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

linux-binary-libtorch-release / libtorch-cpu-shared-with-deps-release-build / build (gh) (trunk failure)
Process completed with exit code 1.
linux-binary-manywheel / manywheel-py3_9-cuda11_8-build / build (gh) (trunk failure)
Process completed with exit code 1.
linux-binary-manywheel / manywheel-py3_9-cuda12_6-build / build (gh) (trunk failure)
Process completed with exit code 1.
linux-binary-manywheel / manywheel-py3_9-cuda12_8-build / build (gh) (trunk failure)
Process completed with exit code 1.
pull / linux-jammy-py3.9-gcc11 / test (backwards_compat, 1, 1, ephemeral.linux.2xlarge) (gh) (trunk failure)
test_modules_can_be_imported

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

⏳ pull / cuda12.4-py3.10-gcc9-sm75 / test (pr_time_benchmarks, 1, 1, linux.g4dn.metal.nvidia.gpu) (gh) (#149370)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: c52abed Pull Request resolved: #149665

cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv [ghstack-poisoned]

cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv voznesenskym penguinwu Guobing-Chen XiaobingSuper zhuhaozhe blzheng jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]

ghstack-source-id: 5a2daa1 Pull Request resolved: #149665

This PR was inspired by internal models that were cache missing due to PGO. At a high level the problem looks as follows Run 1, Invocation 1: We do static compile, save some example values in PGO/automatic dynamic Run 1, Invocation 2: We detect varying inputs, do dynamic compile, get a dynamic graph and save to PGO. Crucially what we save to PGO is actually a superset of what is actually dynamic. If we notice an input was varying, we mark it as dynamic in PGO even if later on that value gets specialized. When a value gets specialized, we actually remove the symbol from the graph. This results in an interesting conundrum where although we are producing the same isomorphic graph, PGO makes the second run miss. Let's see how.... Run 2, Invocation 1: We fetch the PGO, over-mark things as dynamic, get a fx graph, look it up in the cache and... whoops! cache miss! This is because of the aforementioned behavior where the PGO profile will cause us to over-allocate symbols. In practice this means we end up saving a graph in cache with symbols x:s1, y:s3 and on second attempt we look up with x:s1, y:s6 where symbols s3,s4,s5 were all optimistically marked dynamic by PGO and subsequently specialized. We solve this problem by hashing the source names. This ensures somewhat stable assignment. To prevent catastrophic symbol collisions, we use linear probing to ensure no collisions. [ghstack-poisoned]

ghstack-source-id: 62a52af Pull Request resolved: #149665

This PR was inspired by internal models that were cache missing due to PGO. At a high level the problem looks as follows Run 1, Invocation 1: We do static compile, save some example values in PGO/automatic dynamic Run 1, Invocation 2: We detect varying inputs, do dynamic compile, get a dynamic graph and save to PGO. Crucially what we save to PGO is actually a superset of what is actually dynamic. If we notice an input was varying, we mark it as dynamic in PGO even if later on that value gets specialized. When a value gets specialized, we actually remove the symbol from the graph. This results in an interesting conundrum where although we are producing the same isomorphic graph, PGO makes the second run cache miss. Let's see how.... Run 2, Invocation 1: We fetch the PGO, over-mark things as dynamic, get a fx graph, look it up in the cache and... whoops! cache miss! This is because of the aforementioned behavior where the PGO profile will cause us to over-allocate symbols. In practice this means we end up saving a graph in cache with symbols x:s1, y:s3 and on second attempt we cache miss with x:s1, y:s6 where symbols s3,s4,s5 were all optimistically marked dynamic by PGO and subsequently specialized. We solve this problem by hashing the source names. This ensures somewhat stable assignment. To prevent catastrophic symbol collisions, we use linear probing to ensure no collisions. [ghstack-poisoned]

This PR was inspired by internal models that were cache missing due to PGO. At a high level the problem looks as follows Run 1, Invocation 1: We do static compile, save some example values in PGO/automatic dynamic Run 1, Invocation 2: We detect varying inputs, do dynamic compile, get a dynamic graph and save to PGO. Crucially what we save to PGO is actually a superset of what is actually dynamic. If we notice an input was varying, we mark it as dynamic in PGO even if later on that value gets specialized. When a value gets specialized, we actually remove the symbol from the graph. This results in an interesting conundrum where although we are producing the same isomorphic graph, PGO makes the second run cache miss. Let's see how.... Run 2, Invocation 1: We fetch the PGO, over-mark things as dynamic, get a fx graph, look it up in the cache and... whoops! cache miss! This is because of the aforementioned behavior where the PGO profile will cause us to over-allocate symbols. In practice this means we end up saving a graph in cache with symbols x:s1, y:s3 and on second attempt we cache miss with x:s1, y:s6 where symbols s3,s4,s5 were all optimistically marked dynamic by PGO and subsequently specialized. We solve this problem by hashing the source names. This ensures somewhat stable assignment. To prevent catastrophic symbol collisions, we use linear probing to ensure no collisions. cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv voznesenskym penguinwu Guobing-Chen XiaobingSuper zhuhaozhe blzheng jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]

bobrenjc93 · 2025-03-28T00:26:00Z

@pytorchbot merge

bobrenjc93 · 2025-03-28T00:26:08Z

@pytorchbot merge

pytorchmergebot · 2025-03-28T00:28:31Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-03-28T00:28:40Z

Merge failed

Reason: 2 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

This PR was inspired by internal models that were cache missing due to PGO. At a high level the problem looks as follows Run 1, Invocation 1: We do static compile, save some example values in PGO/automatic dynamic Run 1, Invocation 2: We detect varying inputs, do dynamic compile, get a dynamic graph and save to PGO. Crucially what we save to PGO is actually a superset of what is actually dynamic. If we notice an input was varying, we mark it as dynamic in PGO even if later on that value gets specialized. When a value gets specialized, we actually remove the symbol from the graph. This results in an interesting conundrum where although we are producing the same isomorphic graph, PGO makes the second run cache miss. Let's see how.... Run 2, Invocation 1: We fetch the PGO, over-mark things as dynamic, get a fx graph, look it up in the cache and... whoops! cache miss! This is because of the aforementioned behavior where the PGO profile will cause us to over-allocate symbols. In practice this means we end up saving a graph in cache with symbols x:s1, y:s3 and on second attempt we cache miss with x:s1, y:s6 where symbols s3,s4,s5 were all optimistically marked dynamic by PGO and subsequently specialized. We solve this problem by hashing the source names. This ensures somewhat stable assignment. To prevent catastrophic symbol collisions, we use linear probing to ensure no collisions. cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv voznesenskym penguinwu Guobing-Chen XiaobingSuper zhuhaozhe blzheng jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]

ghstack-source-id: 7e69b8f Pull Request resolved: #149665

bobrenjc93 · 2025-03-28T00:55:22Z

@pytorchbot merge

pytorchmergebot · 2025-03-28T00:57:14Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-03-28T01:02:45Z

Merge failed

Reason: 2 jobs have failed, first few of them are: linux-binary-manywheel / manywheel-py3_9-cuda12_6-build / build, linux-binary-manywheel / manywheel-py3_9-cuda11_8-build / build

Details for Dev Infra team

Raised by workflow job

bobrenjc93 · 2025-03-28T01:04:46Z

@pytorchbot merge -i

pytorchmergebot · 2025-03-28T01:06:38Z

Merge started

Your change will be merged while ignoring the following 0 checks:

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…n non args only " PR #149665 added a change to the optimized add this causing an issue internally. In general make_optimized should be only called with non no args. cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv [ghstack-poisoned]

…n non args only " PR #149665 added a change to the optimized add this causing an issue internally. In general make_optimized should be only called with valid new_args. cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv [ghstack-poisoned]

… called on non args only " PR #149665 did a change to the optimized_add that is causing an issue internally. In general make_optimized should be only be called with valid new_args, new_args can become None when elements already exists also, we should break out of the loop in that case. Note that I also only maintained the optimized summation when both lhs and rhs lengths are <=2. This is ok because the optimization is based on the inductive property of adding one symbol at a time. the [2]+[2] here is serving as base case ( i feel we can also remove it ) . Note that keeping it for all sizes while correct, I am not sure if tis as efficient (we will do N log(n) insertions). there is no current justification for that. cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv [ghstack-poisoned]

…non args only (#150955) PR #149665 did a change to the optimized_add that is causing an issue internally. In general make_optimized should be only be called with valid new_args, new_args can become None when elements already exists also, we should break out of the loop in that case. Note that I also only maintained the optimized summation when both lhs and rhs lengths are <=2. This is ok because the optimization is based on the inductive property of adding one symbol at a time. the [2]+[2] here is serving as base case ( i feel we can also remove it ) . Note that keeping it for all sizes while correct, I am not sure if tis as efficient (we will do N log(n) insertions). there is no current justification for that. Pull Request resolved: #150955 Approved by: https://github.com/Mingming-Ding, https://github.com/atalman, https://github.com/bobrenjc93

…non args only (pytorch#150955) PR pytorch#149665 did a change to the optimized_add that is causing an issue internally. In general make_optimized should be only be called with valid new_args, new_args can become None when elements already exists also, we should break out of the loop in that case. Note that I also only maintained the optimized summation when both lhs and rhs lengths are <=2. This is ok because the optimization is based on the inductive property of adding one symbol at a time. the [2]+[2] here is serving as base case ( i feel we can also remove it ) . Note that keeping it for all sizes while correct, I am not sure if tis as efficient (we will do N log(n) insertions). there is no current justification for that. Pull Request resolved: pytorch#150955 Approved by: https://github.com/Mingming-Ding, https://github.com/atalman, https://github.com/bobrenjc93

This PR was inspired by internal models that were cache missing due to PGO. At a high level the problem looks as follows Run 1, Invocation 1: We do static compile, save some example values in PGO/automatic dynamic Run 1, Invocation 2: We detect varying inputs, do dynamic compile, get a dynamic graph and save to PGO. Crucially what we save to PGO is actually a superset of what is actually dynamic. If we notice an input was varying, we mark it as dynamic in PGO even if later on that value gets specialized. When a value gets specialized, we actually remove the symbol from the graph. This results in an interesting conundrum where although we are producing the same isomorphic graph, PGO makes the second run cache miss. Let's see how.... Run 2, Invocation 1: We fetch the PGO, over-mark things as dynamic, get a fx graph, look it up in the cache and... whoops! cache miss! This is because of the aforementioned behavior where the PGO profile will cause us to over-allocate symbols. In practice this means we end up saving a graph in cache with symbols x:s1, y:s3 and on second attempt we cache miss with x:s1, y:s6 where symbols s3,s4,s5 were all optimistically marked dynamic by PGO and subsequently specialized. We solve this problem by hashing the source names. This ensures somewhat stable assignment. To prevent catastrophic symbol collisions, we use linear probing to ensure no collisions. Pull Request resolved: pytorch#149665 Approved by: https://github.com/Mingming-Ding, https://github.com/laithsakka

…ch#149665)" This reverts commit 1f92348. Reverted pytorch#149665 on behalf of https://github.com/malfet due to Broke trunk, see https://hud.pytorch.org/hud/pytorch/pytorch/6eb3c2e2822c50d8a87b43938a9cf7ef0561ede2/1?per_page=50&name_filter=linux-focal-cuda12.6-py3.10-gcc11-sm89&mergeLF=true ([comment](pytorch#149665 (comment)))

This PR was inspired by internal models that were cache missing due to PGO. At a high level the problem looks as follows Run 1, Invocation 1: We do static compile, save some example values in PGO/automatic dynamic Run 1, Invocation 2: We detect varying inputs, do dynamic compile, get a dynamic graph and save to PGO. Crucially what we save to PGO is actually a superset of what is actually dynamic. If we notice an input was varying, we mark it as dynamic in PGO even if later on that value gets specialized. When a value gets specialized, we actually remove the symbol from the graph. This results in an interesting conundrum where although we are producing the same isomorphic graph, PGO makes the second run cache miss. Let's see how.... Run 2, Invocation 1: We fetch the PGO, over-mark things as dynamic, get a fx graph, look it up in the cache and... whoops! cache miss! This is because of the aforementioned behavior where the PGO profile will cause us to over-allocate symbols. In practice this means we end up saving a graph in cache with symbols x:s1, y:s3 and on second attempt we cache miss with x:s1, y:s6 where symbols s3,s4,s5 were all optimistically marked dynamic by PGO and subsequently specialized. We solve this problem by hashing the source names. This ensures somewhat stable assignment. To prevent catastrophic symbol collisions, we use linear probing to ensure no collisions. Pull Request resolved: pytorch#149665 Approved by: https://github.com/Mingming-Ding, https://github.com/laithsakka

…non args only (pytorch#150955) PR pytorch#149665 did a change to the optimized_add that is causing an issue internally. In general make_optimized should be only be called with valid new_args, new_args can become None when elements already exists also, we should break out of the loop in that case. Note that I also only maintained the optimized summation when both lhs and rhs lengths are <=2. This is ok because the optimization is based on the inductive property of adding one symbol at a time. the [2]+[2] here is serving as base case ( i feel we can also remove it ) . Note that keeping it for all sizes while correct, I am not sure if tis as efficient (we will do N log(n) insertions). there is no current justification for that. Pull Request resolved: pytorch#150955 Approved by: https://github.com/Mingming-Ding, https://github.com/atalman, https://github.com/bobrenjc93

Use hashing to generate consistent symbolic ids

8aecc03

[ghstack-poisoned]

pytorch-bot bot added ciflow/inductor release notes: fx release notes category labels Mar 20, 2025

bobrenjc93 added a commit that referenced this pull request Mar 20, 2025

Use hashing to generate consistent symbolic ids

a56a819

ghstack-source-id: c52abed Pull Request resolved: #149665

facebook-github-bot added the fx label Mar 20, 2025

Update on "Use hashing to generate consistent symbolic ids"

8c363f5

cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv [ghstack-poisoned]

pytorch-bot bot added the module: inductor label Mar 22, 2025

Update on "Use hashing to generate consistent symbolic ids"

e844a4c

cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv voznesenskym penguinwu Guobing-Chen XiaobingSuper zhuhaozhe blzheng jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]

bobrenjc93 added a commit that referenced this pull request Mar 22, 2025

Use hashing to generate consistent symbolic ids

b8438a1

ghstack-source-id: 5a2daa1 Pull Request resolved: #149665

pytorch-bot bot added the module: dynamo label Mar 22, 2025

bobrenjc93 added a commit that referenced this pull request Mar 23, 2025

Use hashing to generate consistent symbolic ids

695a1b0

ghstack-source-id: 62a52af Pull Request resolved: #149665

bobrenjc93 changed the title ~~Use hashing to generate consistent symbolic ids~~ Use source hashing to generate consistent symbolic ids Mar 23, 2025

bobrenjc93 added the keep-going Don't stop on first failure, keep running tests until the end label Mar 23, 2025

pytorchmergebot added the merging label Mar 28, 2025

pytorchmergebot removed the merging label Mar 28, 2025

bobrenjc93 added a commit that referenced this pull request Mar 28, 2025

Use hashing to generate consistent symbolic ids

39df74a

ghstack-source-id: 7e69b8f Pull Request resolved: #149665

pytorchmergebot added the merging label Mar 28, 2025

pytorchmergebot removed the merging label Mar 28, 2025

pytorchmergebot added the merging label Mar 28, 2025

pytorchmergebot closed this in f649ee7 Mar 28, 2025

pytorchmergebot removed the merging label Mar 28, 2025

laithsakka mentioned this pull request Apr 9, 2025

Fix issue in optimized_add issue: make_optimized should be called on non args only #150955

Closed

github-actions bot deleted the gh/bobrenjc93/303/head branch May 2, 2025 02:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use source hashing to generate consistent symbolic ids #149665

Use source hashing to generate consistent symbolic ids #149665

Uh oh!

bobrenjc93 commented Mar 20, 2025 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented Mar 20, 2025 •

edited

Loading

Uh oh!

bobrenjc93 commented Mar 28, 2025

Uh oh!

bobrenjc93 commented Mar 28, 2025

Uh oh!

pytorchmergebot commented Mar 28, 2025

Uh oh!

pytorchmergebot commented Mar 28, 2025

Uh oh!

bobrenjc93 commented Mar 28, 2025

Uh oh!

pytorchmergebot commented Mar 28, 2025

Uh oh!

pytorchmergebot commented Mar 28, 2025

Uh oh!

bobrenjc93 commented Mar 28, 2025

Uh oh!

pytorchmergebot commented Mar 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Use source hashing to generate consistent symbolic ids #149665

Use source hashing to generate consistent symbolic ids #149665

Uh oh!

Conversation

bobrenjc93 commented Mar 20, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Mar 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/149665

⏳ 1 Pending, 5 Unrelated Failures

Uh oh!

bobrenjc93 commented Mar 28, 2025

Uh oh!

bobrenjc93 commented Mar 28, 2025

Uh oh!

pytorchmergebot commented Mar 28, 2025

Merge started

Uh oh!

pytorchmergebot commented Mar 28, 2025

Merge failed

Uh oh!

bobrenjc93 commented Mar 28, 2025

Uh oh!

pytorchmergebot commented Mar 28, 2025

Merge started

Uh oh!

pytorchmergebot commented Mar 28, 2025

Merge failed

Uh oh!

bobrenjc93 commented Mar 28, 2025

Uh oh!

pytorchmergebot commented Mar 28, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

bobrenjc93 commented Mar 20, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Mar 20, 2025 •

edited

Loading