KEMBAR78
[inductor][memory] restructuring memory.py and turn on the flag by xuanzhang816 · Pull Request #137205 · pytorch/pytorch · GitHub
Skip to content

Conversation

@xuanzhang816
Copy link
Contributor

@xuanzhang816 xuanzhang816 commented Oct 2, 2024

@pytorch-bot
Copy link

pytorch-bot bot commented Oct 2, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/137205

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 3cba60a with merge base 10a34dc (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@xuanzhang816
Copy link
Contributor Author

@pytorchbot label "topic: not user facing"

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Oct 2, 2024
succ_nodes: OrderedSet[BaseSchedulerNode] = dataclasses.field(
default_factory=OrderedSet
)
outdegree: int = 0 # this is used only in topological_sort_lpmf
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In MemoryPlanningInfoForBuffer, I am only keeping the attributes that are static. For dynamically set attributes (e.g., outdegree), will keep track of them during the function that needs them.

Similarly, removed e.g., memory_to_free and indegree from MemoryPlanningInfoForNode.

@xuanzhang816 xuanzhang816 marked this pull request as ready for review October 3, 2024 17:26
@xuanzhang816 xuanzhang816 requested a review from eellison October 3, 2024 17:26
@xuanzhang816 xuanzhang816 changed the title Restructuring the memory pass [inductor][memory] restructuring memory.py Oct 4, 2024
@xuanzhang816 xuanzhang816 changed the title [inductor][memory] restructuring memory.py [inductor][memory] restructuring memory.py and turn on the flag Oct 4, 2024
@xuanzhang816 xuanzhang816 requested a review from yf225 October 4, 2024 22:01
Copy link
Contributor

@eellison eellison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great ! i had one concern about potential O(n^2) but turns out that is not a real issue (commenting on other pr for continuity). Can we please submit a dashboard run first ? if we are at all worried about riskiness of change it can also make sense to add a JK for enablement, or enable in OSS first then fbcode

]

# enable operator reordering for peak memory optimization
reorder_for_peak_memory = os.environ.get("TORCHINDUCTOR_REORDER_FOR_PEAK_MEMORY") == "1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we do a dashboard run yet ? could we do that before landing ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's easiest to do via ghstack but i think you can also push to origin/main or something I forget exactly. would recommend ghstack.

If you've already submitted this pr, the easiest to do that is just to git reset --soft HEAD~1; git commit -m "tmp" ; ghstack and benchmark this commit as another pr.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me try to figure this out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry one other thing - we should test out the various implementations on ci runs as well, if possible. unless you already have clear idea that one is pareto optimal (compilation time, average memory change, worst memory change).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I submitted a run here. Seems that it takes a few days for the results to show up in the dashboard. I will report back once the results show up.

Copy link
Contributor Author

@xuanzhang816 xuanzhang816 Oct 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we are at all worried about riskiness of change it can also make sense to add a JK for enablement, or enable in OSS first then fbcode

I was initially very concerned about this, but as I am trying this flag on for more internal models, I am becoming less concerned. Though there are many potential situations I am not aware of, so I think it would be the best to leave this decision to the PyTorch team. You guys are are the experts here :) One possibility is that once we turn it on, I can monitor it carefully for a while?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems all the dashboard run are done. This is the screenshot for the the summary (seems nothing is signficant).

image

Individual breakdowns can be found here:

@xuanzhang816 xuanzhang816 requested a review from eellison October 17, 2024 21:41
@xuanzhang816
Copy link
Contributor Author

@pytorchmergebot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 23, 2024
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

@xuanzhang816
Copy link
Contributor Author

@pytorchbot drci

@xuanzhang816
Copy link
Contributor Author

@pytorchmergebot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: inductor / cuda12.1-py3.10-gcc9-sm86 / test (inductor_distributed, 1, 1, linux.g5.12xlarge.nvidia.gpu)

Details for Dev Infra team Raised by workflow job

@pytorch-bot pytorch-bot bot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Oct 24, 2024
@xuanzhang816
Copy link
Contributor Author

@pytorchmergebot merge -f "remaining tests queued for too long"

@pytorch pytorch deleted a comment from pytorch-bot bot Oct 25, 2024
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request Merged module: dynamo module: inductor oncall: distributed Add this issue/PR to distributed oncall triage queue topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants