KEMBAR78
introduce per-file timing-stats by huguesb · Pull Request #12639 · python/mypy · GitHub
Skip to content

Conversation

@huguesb
Copy link
Contributor

@huguesb huguesb commented Apr 21, 2022

When profiling mypy over a large codebase, it can be useful
to know which files are slowest to typecheck.

Gather per-file timing stats and expose them through a new
(hidden) command line switch

When profiling mypy over a large codebase, it can be useful
to know which files are slowest to typecheck.

Gather per-file timing stats and expose them through a new
(hidden) command line switch
@github-actions

This comment has been minimized.

1 similar comment
@github-actions

This comment has been minimized.

Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I've wanted to have this feature myself when debugging performance issues.

Can you add at least one integration test that ensures that mypy can be run with the flag successfully? Possibly also check the format of the generated file. Obviously the timings aren't repeatable, so they would have to be ignored.

Also, the builds were failing.

@huguesb
Copy link
Contributor Author

huguesb commented Apr 22, 2022

Can you add at least one integration test that ensures that mypy can be run with the flag successfully? Possibly also check the format of the generated file. Obviously the timings aren't repeatable, so they would have to be ignored.

Done. I added a way to specify test output files via regex to achieve that.

Also, the builds were failing.

Yeah, I didn't realize mypy still supported py3.6
Fixed

@huguesb huguesb mentioned this pull request Apr 22, 2022
@github-actions

This comment has been minimized.

1 similar comment
@github-actions

This comment has been minimized.

@huguesb huguesb requested a review from JukkaL April 26, 2022 04:03
Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a few optional comments.

mypy/build.py Outdated
t0 = time.perf_counter()
with self.wrap_context():
self.type_checker().check_first_pass()
self.time_spent_us += int((time.perf_counter() - t0) * 1e6)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we do int((time.perf_counter() - t0) * 1e6) a lot, it could be a bit cleaner to refactor (most of) it to a helper function. E.g. something like this:

def time_us() -> int:
    return int(time.perf_counter() * 1e6)

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, that seems fine. That can then be refactored to use perf_counter_ns whenever support for py3.6 is dropped

Dump timing stats for each file in the given graph
"""
with open(path, 'w') as f:
for k in sorted(graph.keys()):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about sorting by the time spent in file instead of the module name? It will be easy to search by module name within the stats dump, but sorting by the time is slightly more involved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A straightforward sort -n -k2 on the output can give you time-sorting from the current format.

I originally thought about the module order being nice to get diffs between runs but the jitter is going to be high enough that such a diff would probably need a little extra massaging (e.g only show changes above a certain % threshold) to actually be helpful.

TL;DR I don't think the sort order makes much of a difference in practice so I would be fine going with either order, or even letting it be unsorted. In follow-up commit we can introduce a script similar to go's benchstat to get more value out of the raw data.

@github-actions

This comment has been minimized.

@huguesb huguesb force-pushed the pr-timing-stats branch 2 times, most recently from edeed8d to caddce9 Compare April 27, 2022 20:17
@github-actions

This comment has been minimized.

1 similar comment
@github-actions

This comment has been minimized.

@github-actions
Copy link
Contributor

According to mypy_primer, this change has no effect on the checked open source code. 🤖🎉

Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks good.

@JukkaL JukkaL merged commit d48d548 into python:master Apr 29, 2022
@huguesb huguesb deleted the pr-timing-stats branch April 29, 2022 18:48
@97littleleaf11
Copy link
Collaborator

cool feature!

@huguesb
Copy link
Contributor Author

huguesb commented Oct 11, 2022 via email

@JelleZijlstra
Copy link
Member

@huguesb did you comment on the wrong PR? For what it's worth we dropped 3.6 support.

@huguesb
Copy link
Contributor Author

huguesb commented Oct 11, 2022

uh, that's very weird. I did not intentionally send that... it looks relevant to the history of this PR but not after its closure, and I don't even see this message as a "sent" item my email client...

@JelleZijlstra
Copy link
Member

Maybe some email queue in GitHub had a very long delay :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants