-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[inductor] Minor refactor of hip compile_meta #143815
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/143815
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 1e13b0a with merge base a8ac3a6 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
| compile_meta["constants"][k] = v | ||
| cfg_kwargs = cfg.kwargs | ||
| if self.device_props.type == "hip": | ||
| cfg_kwargs = {**cfg_kwargs} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| cfg_kwargs = {**cfg_kwargs} | |
| cfg_kwargs = dict(cfg_kwargs) |
Any reason why not just above? or just explicitly make a shallow copy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Skylion007 the version I have is 1.5x faster:
>>> other={1:2, 3:4}
>>> timeit.timeit(lambda: dict(other))
0.09024659299757332
>>> timeit.timeit(lambda: {**other})
0.05995289294514805
>>> There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and what about other.copy()? Also @jansel what about as other dict grows? My understanding is dict() may have more initial overhead due to the function call lookup, but better scaling with larger dicts.
| cfg_kwargs = {**cfg_kwargs} | |
| cfg_kwargs = cfg_kwargs.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Skylion007 other.copy() is also slower:
>>> timeit.timeit(lambda: other.copy())
0.075550084002316For large dicts all 3 become the same (though {**other} is slightly faster):
>>> large = dict.fromkeys(range(1000))
>>> timeit.timeit(lambda: dict(large))
2.926349258981645
>>> timeit.timeit(lambda: {**large})
2.9151299450313672
>>> timeit.timeit(lambda: large.copy())
2.917799219954759There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of this should be surprising if you look at the output bytecode:
dictbecomesLOAD_GLOBAL, which involves 2 module lookups (one forglobals()and one forbuiltins-- both of which are hashtables). You can writedict=MyDictin global scope, so this is very hard to optimize.copybecomesLOAD_METHOD, which involves a method lookup on an object (also a hashtable lookup). The object might not be adict, so this is very hard to optimize.{**x}involves zero string lookups, since you get DICT-specific bytecodes with no dynamic behavior.
The exact same logic flows making [*x] faster than list(x).
Pull Request resolved: #143817 Approved by: https://github.com/eellison ghstack dependencies: #143813, #143814, #143815
Stack from ghstack (oldest at bottom):
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov