KEMBAR78
Let mypyc optimise os.path.join by hauntsaninja · Pull Request #17949 · python/mypy · GitHub
Skip to content

Conversation

hauntsaninja
Copy link
Collaborator

@hauntsaninja hauntsaninja commented Oct 15, 2024

See #17948

There's one call site which has varargs that I leave as os.path.join, it doesn't show up on my profile. I do see the endswith on the profile, we could try path[-1] == '/' instead (could save a few dozen milliseconds)

In my work environment, this is about a 10% speedup:

λ hyperfine -w 1 -M 3 '/tmp/mypy_primer/timer_mypy_6eddd3ab1/venv/bin/mypy  -c "import torch" --no-incremental --python-executable /opt/oai/bin/python'
Benchmark 1: /tmp/mypy_primer/timer_mypy_6eddd3ab1/venv/bin/mypy  -c "import torch" --no-incremental --python-executable /opt/oai/bin/python
  Time (mean ± σ):     30.842 s ±  0.119 s    [User: 26.383 s, System: 4.396 s]
  Range (min … max):   30.706 s … 30.927 s    3 runs

Compared to:

λ hyperfine -w 1 -M 3 '/tmp/mypy_primer/timer_mypy_88ae62b4a/venv/bin/mypy  -c "import torch" --no-incremental --python-executable /opt/oai/bin/python'
Benchmark 1: /tmp/mypy_primer/timer_mypy_88ae62b4a/venv/bin/mypy  -c "import torch" --no-incremental --python-executable /opt/oai/bin/python
  Time (mean ± σ):     34.161 s ±  0.163 s    [User: 29.818 s, System: 4.289 s]
  Range (min … max):   34.013 s … 34.336 s    3 runs

In the toy "long" environment mentioned in the issue, this is about a 7% speedup:

λ hyperfine -w 1 -M 3 '/tmp/mypy_primer/timer_mypy_6eddd3ab1/venv/bin/mypy  -c "import torch" --no-incremental --python-executable long/bin/python'
Benchmark 1: /tmp/mypy_primer/timer_mypy_6eddd3ab1/venv/bin/mypy  -c "import torch" --no-incremental --python-executable long/bin/python
  Time (mean ± σ):     23.177 s ±  0.317 s    [User: 20.265 s, System: 2.873 s]
  Range (min … max):   22.815 s … 23.407 s    3 runs

Compared to:

λ hyperfine -w 1 -M 3 '/tmp/mypy_primer/timer_mypy_88ae62b4a/venv/bin/mypy -c "import torch" --python-executable=long/bin/python --no-incremental'
Benchmark 1: /tmp/mypy_primer/timer_mypy_88ae62b4a/venv/bin/mypy -c "import torch" --python-executable=long/bin/python --no-incremental
  Time (mean ± σ):     24.838 s ±  0.237 s    [User: 22.038 s, System: 2.750 s]
  Range (min … max):   24.598 s … 25.073 s    3 runs

In the "clean" environment, this is a 1% speedup, but below the noise floor.

hauntsaninja and others added 2 commits October 14, 2024 23:07
See python#17948

There's one call site which has varargs that I leave as os.path.join, it
doesn't show up on my profile. I do see the `endswith` on the profile,
we could try `path[-1] == '/'` instead

In my work environment, this is about a 10% speedup:
```
λ hyperfine -w 1 -M 3 '/tmp/mypy_primer/timer_mypy_6eddd3ab1/venv/bin/mypy  -c "import torch" --no-incremental --python-executable /opt/oai/bin/python'
Benchmark 1: /tmp/mypy_primer/timer_mypy_6eddd3ab1/venv/bin/mypy  -c "import torch" --no-incremental --python-executable /opt/oai/bin/python
  Time (mean ± σ):     30.842 s ±  0.119 s    [User: 26.383 s, System: 4.396 s]
  Range (min … max):   30.706 s … 30.927 s    3 runs
```
Compared to:
```
λ hyperfine -w 1 -M 3 '/tmp/mypy_primer/timer_mypy_88ae62b4a/venv/bin/mypy  -c "import torch" --no-incremental --python-executable /opt/oai/bin/python'
Benchmark 1: /tmp/mypy_primer/timer_mypy_88ae62b4a/venv/bin/mypy  -c "import torch" --no-incremental --python-executable /opt/oai/bin/python
  Time (mean ± σ):     34.161 s ±  0.163 s    [User: 29.818 s, System: 4.289 s]
  Range (min … max):   34.013 s … 34.336 s    3 runs
```

In the toy "long" environment mentioned in the issue, this is about a 7%
speedup:
```
λ hyperfine -w 1 -M 3 '/tmp/mypy_primer/timer_mypy_6eddd3ab1/venv/bin/mypy  -c "import torch" --no-incremental --python-executable long/bin/python'
Benchmark 1: /tmp/mypy_primer/timer_mypy_6eddd3ab1/venv/bin/mypy  -c "import torch" --no-incremental --python-executable long/bin/python
  Time (mean ± σ):     23.177 s ±  0.317 s    [User: 20.265 s, System: 2.873 s]
  Range (min … max):   22.815 s … 23.407 s    3 runs
```
Compared to:
```
λ hyperfine -w 1 -M 3 '/tmp/mypy_primer/timer_mypy_88ae62b4a/venv/bin/mypy -c "import torch" --python-executable=long/bin/python --no-incremental'
Benchmark 1: /tmp/mypy_primer/timer_mypy_88ae62b4a/venv/bin/mypy -c "import torch" --python-executable=long/bin/python --no-incremental
  Time (mean ± σ):     24.838 s ±  0.237 s    [User: 22.038 s, System: 2.750 s]
  Range (min … max):   24.598 s … 25.073 s    3 runs
```
@github-actions
Copy link
Contributor

According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅

Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! It's amazing that we spent so much CPU in os.path.join.

@JukkaL JukkaL merged commit fea947a into python:master Oct 15, 2024
18 checks passed
@hauntsaninja hauntsaninja deleted the mypyc-join branch October 15, 2024 09:07
hauntsaninja added a commit that referenced this pull request Oct 20, 2024
See #17948

There's one call site which has varargs that I leave as os.path.join, it
doesn't show up on my profile. I do see the `endswith` on the profile,
we could try `path[-1] == '/'` instead (could save a few dozen
milliseconds)

In my work environment, this is about a 10% speedup:
```
λ hyperfine -w 1 -M 3 '/tmp/mypy_primer/timer_mypy_6eddd3ab1/venv/bin/mypy  -c "import torch" --no-incremental --python-executable /opt/oai/bin/python'
Benchmark 1: /tmp/mypy_primer/timer_mypy_6eddd3ab1/venv/bin/mypy  -c "import torch" --no-incremental --python-executable /opt/oai/bin/python
  Time (mean ± σ):     30.842 s ±  0.119 s    [User: 26.383 s, System: 4.396 s]
  Range (min … max):   30.706 s … 30.927 s    3 runs
```
Compared to:
```
λ hyperfine -w 1 -M 3 '/tmp/mypy_primer/timer_mypy_88ae62b4a/venv/bin/mypy  -c "import torch" --no-incremental --python-executable /opt/oai/bin/python'
Benchmark 1: /tmp/mypy_primer/timer_mypy_88ae62b4a/venv/bin/mypy  -c "import torch" --no-incremental --python-executable /opt/oai/bin/python
  Time (mean ± σ):     34.161 s ±  0.163 s    [User: 29.818 s, System: 4.289 s]
  Range (min … max):   34.013 s … 34.336 s    3 runs
```

In the toy "long" environment mentioned in the issue, this is about a 7%
speedup:
```
λ hyperfine -w 1 -M 3 '/tmp/mypy_primer/timer_mypy_6eddd3ab1/venv/bin/mypy  -c "import torch" --no-incremental --python-executable long/bin/python'
Benchmark 1: /tmp/mypy_primer/timer_mypy_6eddd3ab1/venv/bin/mypy  -c "import torch" --no-incremental --python-executable long/bin/python
  Time (mean ± σ):     23.177 s ±  0.317 s    [User: 20.265 s, System: 2.873 s]
  Range (min … max):   22.815 s … 23.407 s    3 runs
```
Compared to:
```
λ hyperfine -w 1 -M 3 '/tmp/mypy_primer/timer_mypy_88ae62b4a/venv/bin/mypy -c "import torch" --python-executable=long/bin/python --no-incremental'
Benchmark 1: /tmp/mypy_primer/timer_mypy_88ae62b4a/venv/bin/mypy -c "import torch" --python-executable=long/bin/python --no-incremental
  Time (mean ± σ):     24.838 s ±  0.237 s    [User: 22.038 s, System: 2.750 s]
  Range (min … max):   24.598 s … 25.073 s    3 runs
```

In the "clean" environment, this is a 1% speedup, but below the noise
floor.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants