KEMBAR78
Move torch.logspace to ATen and parallelize on CPU. by gchanan · Pull Request #15438 · pytorch/pytorch · GitHub
Skip to content

Conversation

@gchanan
Copy link
Contributor

@gchanan gchanan commented Dec 20, 2018

No description provided.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gchanan has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@gchanan
Copy link
Contributor Author

gchanan commented Dec 20, 2018

Performance comparisons:
Old:

>>> timeit torch.logspace(0,5,512,device='cuda')
23.9 µs ± 2.79 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

>>> timeit torch.logspace(0,5,3*512*512,device='cuda')
23.3 µs ± 1.23 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)

>>> timeit torch.logspace(0,5,512)
44.1 µs ± 372 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

>>> timeit torch.logspace(0,5,3*512*512)
64.1 ms ± 732 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

OMP_NUM_THREADS=1
>>> timeit torch.logspace(0,5,512)
44.8 µs ± 586 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

OMP_NUM_THREADS=1
>>> timeit torch.logspace(0,5,3*512*512)
62.3 ms ± 1.07 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

New:

>>> timeit torch.logspace(0,5,512,device='cuda')
20.5 µs ± 1.41 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)

>>> timeit torch.logspace(0,5,3*512*512,device='cuda')
21.1 µs ± 1.87 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)

>>> timeit torch.logspace(0,5,512)
45.5 µs ± 316 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

>>> timeit torch.logspace(0,5,3*512*512)
3.73 ms ± 14.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

OMP_NUM_THREADS=1
>>> timeit torch.logspace(0,5,512)
46.9 µs ± 1.44 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

OMP_NUM_THREADS=1
>>> timeit torch.logspace(0,5,3*512*512)
65 ms ± 797 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gchanan has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

AT_CHECK(steps >= 0, "number of steps must be non-negative");

if (result.numel() != steps) {
result.resize_({steps});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, interesting that we're willing to write into any tensor that is correct numel. Well, I suppose it's handled correctly below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it's strange but that was the existing behavior.

zdevito pushed a commit to zdevito/ATen that referenced this pull request Dec 21, 2018
Summary: Pull Request resolved: pytorch/pytorch#15438

Reviewed By: ezyang

Differential Revision: D13529626

Pulled By: gchanan

fbshipit-source-id: 896e8afee3d6b5a706c4f5815b91ba6bd8af6672
@ezyang ezyang added the merged label Jun 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants