KEMBAR78
[inductor] Limit cpu copies in autotuning to CUDA devices by masnesral · Pull Request #137509 · pytorch/pytorch · GitHub
Skip to content

Conversation

@masnesral
Copy link
Contributor

@masnesral masnesral commented Oct 8, 2024

Stack from ghstack (oldest at bottom):

Summary: Missed in #136701 (comment): we should perform this optimization only for mutated args on cuda devices

Test Plan: python benchmarks/dynamo/timm_models.py --performance --inductor --device cuda --inference --bfloat16 --print-compilation-time --print-memory --cold-start-latency --only fbnetc_100

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

Summary: Missed in #136701 (comment): we should perform this optimization only for mutated args on cuda devices

Test Plan: `python benchmarks/dynamo/timm_models.py --performance --inductor --device cuda --inference --bfloat16 --print-compilation-time --print-memory --cold-start-latency --only fbnetc_100`

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Oct 8, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/137509

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 9f297f6 with merge base 9b2e453 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

masnesral added a commit that referenced this pull request Oct 8, 2024
Summary: Missed in #136701 (comment): we should perform this optimization only for mutated args on cuda devices

Test Plan: `python benchmarks/dynamo/timm_models.py --performance --inductor --device cuda --inference --bfloat16 --print-compilation-time --print-memory --cold-start-latency --only fbnetc_100`

ghstack-source-id: 3c0c6f2
Pull Request resolved: #137509
@masnesral masnesral requested review from eellison and int3 October 8, 2024 18:44
@masnesral masnesral added the topic: not user facing topic category label Oct 8, 2024
Copy link
Contributor

@int3 int3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@masnesral masnesral added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 8, 2024
@masnesral
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@github-actions github-actions bot deleted the gh/masnesral/121/head branch November 9, 2024 02:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants