-
-
Notifications
You must be signed in to change notification settings - Fork 3k
Force all deserialized objects to the oldest GC generation #19681
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I just realized I did my measurements with fixed-format cache, but I guess the numbers will be similar for JSON cache. |
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Together with the fixed-format cache, import torch
with a warm cache was ~90% faster than before for me, based on a quick experiment!
# a hack, but it gives huge performance wins for large third-party | ||
# libraries, like torch. | ||
gc.collect() | ||
gc.disable() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we get here multiple times, if there are multiple dirty sub-DAGs? If yes, do you think it'll be a problem?
A quick workaround would be to do this only at most N times per run (possibly N=1).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was thinking about this. FWIW, I don't think it will be a problem, since freeze/unfreeze are quite fast. Also, we may accidentally get some objects from the stale SCCs previously processed in the oldest generation, but it is probably not so bad. But also I think it is fine to start with just one pass per run and increase the limit as we get more data for this.
(With mypy -c 'import torch'
we enter here only once)
According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅ |
I am not sure what happens, but for some reason after GC `freeze()`/`unfreeze()` hack #19681 was merged, compiled tests are running twice slower (on GH runner, but I also see much smaller but visible slow-down locally). I have two theories: * The constant overhead we add outweighs the savings when running thousands of tiny builds. * The 8% of extra memory we use goes over the limit in the runner because we were already very close to it. In any case, I propose to try disabling this hack in most tests and see if it helps.
This is a hack, but it gives ~30% perf win for
mypy -c 'import torch'
on a warm run. This should not increase memory consumption too much, since we shouldn't create any cyclic garbage during deserialization (we do create some cyclic references, likeTypeInfo
->SymbolTable
->Instance
->TypeInfo
, but those are genuine long-living objects).