KEMBAR78
[inductor] Minor refactor of hip compile_meta by jansel · Pull Request #143815 · pytorch/pytorch · GitHub
Skip to content

Conversation

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Dec 25, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/143815

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 1e13b0a with merge base a8ac3a6 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]
@jansel jansel removed ciflow/inductor-rocm Trigger "inductor" config CI on ROCm ciflow/rocm Trigger "default" config CI on ROCm labels Dec 25, 2024
[ghstack-poisoned]
compile_meta["constants"][k] = v
cfg_kwargs = cfg.kwargs
if self.device_props.type == "hip":
cfg_kwargs = {**cfg_kwargs}
Copy link
Collaborator

@Skylion007 Skylion007 Dec 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cfg_kwargs = {**cfg_kwargs}
cfg_kwargs = dict(cfg_kwargs)

Any reason why not just above? or just explicitly make a shallow copy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Skylion007 the version I have is 1.5x faster:

>>> other={1:2, 3:4}
>>> timeit.timeit(lambda: dict(other))
0.09024659299757332
>>> timeit.timeit(lambda: {**other})
0.05995289294514805
>>> 

Copy link
Collaborator

@Skylion007 Skylion007 Dec 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and what about other.copy()? Also @jansel what about as other dict grows? My understanding is dict() may have more initial overhead due to the function call lookup, but better scaling with larger dicts.

Suggested change
cfg_kwargs = {**cfg_kwargs}
cfg_kwargs = cfg_kwargs.copy()

Copy link
Contributor Author

@jansel jansel Dec 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Skylion007 other.copy() is also slower:

>>> timeit.timeit(lambda: other.copy())
0.075550084002316

For large dicts all 3 become the same (though {**other} is slightly faster):

>>> large = dict.fromkeys(range(1000))
>>> timeit.timeit(lambda: dict(large))
2.926349258981645
>>> timeit.timeit(lambda: {**large})
2.9151299450313672
>>> timeit.timeit(lambda: large.copy())
2.917799219954759

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of this should be surprising if you look at the output bytecode:

  • dict becomes LOAD_GLOBAL, which involves 2 module lookups (one for globals() and one for builtins -- both of which are hashtables). You can write dict=MyDict in global scope, so this is very hard to optimize.
  • copy becomes LOAD_METHOD, which involves a method lookup on an object (also a hashtable lookup). The object might not be a dict, so this is very hard to optimize.
  • {**x} involves zero string lookups, since you get DICT-specific bytecodes with no dynamic behavior.

The exact same logic flows making [*x] faster than list(x).

pytorchmergebot pushed a commit that referenced this pull request Dec 27, 2024
@github-actions github-actions bot deleted the gh/jansel/465/head branch January 26, 2025 02:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants