Migrate crossKernel from THC to ATen (CUDA) #60039

peterbell10 · 2021-06-15T21:07:30Z

Ref #24507 (There doesn't seem to be an actual issue for cross)

This also moves the remaining operator functors in THCTensorMathPointwise.cuh to SparseCUDATensorMath.cu which is the only file using them.

facebook-github-bot · 2021-06-15T21:07:35Z

💊 CI failures summary and remediations

As of commit 697c739 (more details on the Dr. CI page and at hud.pytorch.org/pr/60039):

3/3 failures introduced in this PR

🕵️ 3 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

pytorch_linux_xenial_py3_6_gcc5_4_build (1/3)

Step: "(Optional) Merge target branch" (full log | diagnosis details | 🔁 rerun)

Automatic merge failed; fix conflicts and then commit the result.

CONFLICT (add/add): Merge conflict in .circleci/scripts/windows_cuda_install.sh
Auto-merging .circleci/scripts/windows_cuda_install.sh
CONFLICT (add/add): Merge conflict in .circleci/scripts/setup_ci_environment.sh
Auto-merging .circleci/scripts/setup_ci_environment.sh
CONFLICT (add/add): Merge conflict in .circleci/docker/build.sh
Auto-merging .circleci/docker/build.sh
CONFLICT (add/add): Merge conflict in .circleci/config.yml
Auto-merging .circleci/config.yml
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/windows_build_definitions.py
Auto-merging .circleci/cimodel/data/windows_build_definitions.py
Automatic merge failed; fix conflicts and then commit the result.


Exited with code exit status 1

pytorch_xla_linux_bionic_py3_6_clang9_build (2/3)

Step: "(Optional) Merge target branch" (full log | diagnosis details | 🔁 rerun)

Automatic merge failed; fix conflicts and then commit the result.

CONFLICT (add/add): Merge conflict in .circleci/scripts/windows_cuda_install.sh
Auto-merging .circleci/scripts/windows_cuda_install.sh
CONFLICT (add/add): Merge conflict in .circleci/scripts/setup_ci_environment.sh
Auto-merging .circleci/scripts/setup_ci_environment.sh
CONFLICT (add/add): Merge conflict in .circleci/docker/build.sh
Auto-merging .circleci/docker/build.sh
CONFLICT (add/add): Merge conflict in .circleci/config.yml
Auto-merging .circleci/config.yml
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/windows_build_definitions.py
Auto-merging .circleci/cimodel/data/windows_build_definitions.py
Automatic merge failed; fix conflicts and then commit the result.


Exited with code exit status 1

pytorch_macos_10_13_py3_test (3/3)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

Jun 23 00:20:37 ERROR [0.004s]: test_poisson_sample (__main__.TestDistributions)

Jun 23 00:20:37   File "distributions/test_distributions.py", line 805, in _check_sampler_discrete
Jun 23 00:20:37     chisq, p = scipy.stats.chisquare(counts[msk], pmf[msk] * num_samples)
Jun 23 00:20:37   File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/scipy/stats/stats.py", line 6853, in chisquare
Jun 23 00:20:37     lambda_="pearson")
Jun 23 00:20:37   File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/scipy/stats/stats.py", line 6694, in power_divergence
Jun 23 00:20:37     raise ValueError(msg)
Jun 23 00:20:37 ValueError: For each axis slice, the sum of the observed frequencies must agree with the sum of the expected frequencies to a relative tolerance of 1e-08, but the percent differences are:
Jun 23 00:20:37 0.008265582255680495
Jun 23 00:20:37 
Jun 23 00:20:37 ======================================================================
Jun 23 00:20:37 ERROR [0.004s]: test_poisson_sample (__main__.TestDistributions)
Jun 23 00:20:37 ----------------------------------------------------------------------
Jun 23 00:20:37 Traceback (most recent call last):
Jun 23 00:20:37   File "distributions/test_distributions.py", line 1333, in test_poisson_sample
Jun 23 00:20:37     failure_rate=1e-3)
Jun 23 00:20:37   File "distributions/test_distributions.py", line 805, in _check_sampler_discrete
Jun 23 00:20:37     chisq, p = scipy.stats.chisquare(counts[msk], pmf[msk] * num_samples)
Jun 23 00:20:37   File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/scipy/stats/stats.py", line 6853, in chisquare
Jun 23 00:20:37     lambda_="pearson")
Jun 23 00:20:37   File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/scipy/stats/stats.py", line 6694, in power_divergence
Jun 23 00:20:37     raise ValueError(msg)

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

peterbell10 · 2021-06-15T21:34:44Z

I measure a performance improvement here:

Shape	Before	After
(100, 100, 100, 3)	99.2 us	93.9 us
(100, 100, 3)	4.91 us	3.91 us

ngimel · 2021-06-22T21:59:50Z

aten/src/ATen/native/cuda/CrossKernel.cu

+                         int64_t x1stride, int64_t x2stride) {
+  const auto N = iter.numel();
+  auto offset_calculator = make_element_offset_calculator<3>(iter);
+  TORCH_INTERNAL_ASSERT(N > 0 && N <= std::numeric_limits<int32_t>::max());


Is this check enough? e.g. if you have a tensors with <INT_MAX element, but it's not memory dense, so maximum offset is >INT_MAX (or UINT_MAX)

I could assert iter.can_use_32bit_indexing() but that's a more costly check and is guarunteed by cross_impl using with_32bit_indexing anyway.

Then with_32bit_indexing should guarantee that N is less than INT_MAX? Or it's here to protect people directly calling launch_cross_kernel, not going through cross_impl? In this case probably should be TORCH_INTERNAL_ASSERT_DEBUG_ONLY (unlikely someone will call launch_cross_kernel directly).

facebook-github-bot · 2021-06-22T22:10:09Z

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-06-23T01:00:57Z

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-06-23T07:40:36Z

@ngimel merged this pull request in 3f3fd57.

peterbell10 added module: porting Issues related to porting TH/THNN legacy to ATen native open source labels Jun 15, 2021

peterbell10 requested a review from ngimel June 15, 2021 21:07

facebook-github-bot added the cla signed label Jun 15, 2021

peterbell10 force-pushed the cross-aten branch from 57daff8 to 3e33d9c Compare June 15, 2021 21:09

Migrate crossKernel from THC to ATen (CUDA)

614ebfc

peterbell10 force-pushed the cross-aten branch from 3e33d9c to 614ebfc Compare June 15, 2021 22:24

anjali411 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jun 16, 2021

ngimel reviewed Jun 22, 2021

View reviewed changes

Assert only in debug builds

697c739

ngimel approved these changes Jun 22, 2021

View reviewed changes

facebook-github-bot closed this in 3f3fd57 Jun 23, 2021

facebook-github-bot added the Merged label Jun 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Migrate crossKernel from THC to ATen (CUDA) #60039

Migrate crossKernel from THC to ATen (CUDA) #60039

Uh oh!

peterbell10 commented Jun 15, 2021

Uh oh!

facebook-github-bot commented Jun 15, 2021 •

edited

Loading

Uh oh!

peterbell10 commented Jun 15, 2021

Uh oh!

ngimel Jun 22, 2021

Uh oh!

peterbell10 Jun 22, 2021

Uh oh!

ngimel Jun 22, 2021

Uh oh!

facebook-github-bot commented Jun 22, 2021

Uh oh!

facebook-github-bot commented Jun 23, 2021

Uh oh!

facebook-github-bot commented Jun 23, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Migrate crossKernel from THC to ATen (CUDA) #60039

Migrate crossKernel from THC to ATen (CUDA) #60039

Uh oh!

Conversation

peterbell10 commented Jun 15, 2021

Uh oh!

facebook-github-bot commented Jun 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

🕵️ 3 new failures recognized by patterns

pytorch_linux_xenial_py3_6_gcc5_4_build (1/3)

pytorch_xla_linux_bionic_py3_6_clang9_build (2/3)

pytorch_macos_10_13_py3_test (3/3)

Uh oh!

peterbell10 commented Jun 15, 2021

Uh oh!

ngimel Jun 22, 2021

Choose a reason for hiding this comment

Uh oh!

peterbell10 Jun 22, 2021

Choose a reason for hiding this comment

Uh oh!

ngimel Jun 22, 2021

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Jun 22, 2021

Uh oh!

facebook-github-bot commented Jun 23, 2021

Uh oh!

facebook-github-bot commented Jun 23, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

facebook-github-bot commented Jun 15, 2021 •

edited

Loading