KEMBAR78
Force all deserialized objects to the oldest GC generation by ilevkivskyi · Pull Request #19681 · python/mypy · GitHub
Skip to content

Conversation

@ilevkivskyi
Copy link
Member

This is a hack, but it gives ~30% perf win for mypy -c 'import torch' on a warm run. This should not increase memory consumption too much, since we shouldn't create any cyclic garbage during deserialization (we do create some cyclic references, like TypeInfo -> SymbolTable -> Instance -> TypeInfo, but those are genuine long-living objects).

@ilevkivskyi
Copy link
Member Author

I just realized I did my measurements with fixed-format cache, but I guess the numbers will be similar for JSON cache.

@github-actions

This comment has been minimized.

Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Together with the fixed-format cache, import torch with a warm cache was ~90% faster than before for me, based on a quick experiment!

# a hack, but it gives huge performance wins for large third-party
# libraries, like torch.
gc.collect()
gc.disable()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we get here multiple times, if there are multiple dirty sub-DAGs? If yes, do you think it'll be a problem?

A quick workaround would be to do this only at most N times per run (possibly N=1).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was thinking about this. FWIW, I don't think it will be a problem, since freeze/unfreeze are quite fast. Also, we may accidentally get some objects from the stale SCCs previously processed in the oldest generation, but it is probably not so bad. But also I think it is fine to start with just one pass per run and increase the limit as we get more data for this.

(With mypy -c 'import torch' we enter here only once)

@github-actions
Copy link
Contributor

According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅

@JukkaL JukkaL merged commit 6c5b13c into python:master Aug 19, 2025
20 checks passed
@ilevkivskyi ilevkivskyi deleted the freeze-unfreeze branch August 19, 2025 18:04
ilevkivskyi added a commit that referenced this pull request Aug 30, 2025
I am not sure what happens, but for some reason after GC
`freeze()`/`unfreeze()` hack #19681
was merged, compiled tests are running twice slower (on GH runner, but I
also see much smaller but visible slow-down locally). I have two
theories:
* The constant overhead we add outweighs the savings when running
thousands of tiny builds.
* The 8% of extra memory we use goes over the limit in the runner
because we were already very close to it.

In any case, I propose to try disabling this hack in most tests and see
if it helps.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants