KEMBAR78
[Docs] Fix docs for torch.chunk by chedatomasz · Pull Request #61097 · pytorch/pytorch · GitHub
Skip to content

Conversation

@chedatomasz
Copy link
Contributor

@chedatomasz chedatomasz commented Jul 1, 2021

torch.chunk may return less than the requested number of chunks silently if some undocumented division constraints are not met. The functionality that users expect is provided by another function: torch.tensor_split

This has led to confusion countless times and who knows how many systems out there are fragile because of this.
My changes describe the discrepancy, show an example and direct users to the usually preferred function.

Issues mentioning this problem:
#9382
torch/torch7#617

I considered documenting the constraint for when an unexpected number of chunks may be returned (it is chunks*chunks>input.size[dim] ), so that users could quickly tell if their code may be affected. Please let me know if you think this should be in the docs or not.

Add an explanation covering pytorch#9382 and pytorch#36403, refer to torch.tensor_split
Also, modify the warning
@facebook-github-bot
Copy link
Contributor

Hi @chedatomasz!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks!

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Jul 1, 2021

💊 CI failures summary and remediations

As of commit cebe4b4 (more details on the Dr. CI page and at hud.pytorch.org/pr/61097):



🕵️ 2 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_xla_linux_bionic_py3_6_clang9_build (1/2)

Step: "(Optional) Merge target branch" (full log | diagnosis details | 🔁 rerun)

Automatic merge failed; fix conflicts and then commit the result.
CONFLICT (add/add): Merge conflict in .circleci/scripts/binary_checkout.sh
Auto-merging .circleci/scripts/binary_checkout.sh
CONFLICT (add/add): Merge conflict in .circleci/config.yml
Auto-merging .circleci/config.yml
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/windows_build_definitions.py
Auto-merging .circleci/cimodel/data/windows_build_definitions.py
CONFLICT (add/add): Merge conflict in .azure_pipelines/job_templates/build-verify-publish-template-win.yml
Auto-merging .azure_pipelines/job_templates/build-verify-publish-template-win.yml
CONFLICT (add/add): Merge conflict in .azure_pipelines/job_templates/build-verify-publish-template-unix.yml
Auto-merging .azure_pipelines/job_templates/build-verify-publish-template-unix.yml
Automatic merge failed; fix conflicts and then commit the result.


Exited with code exit status 1

See CircleCI build pytorch_linux_xenial_py3_6_gcc5_4_build (2/2)

Step: "(Optional) Merge target branch" (full log | diagnosis details | 🔁 rerun)

Automatic merge failed; fix conflicts and then commit the result.
CONFLICT (add/add): Merge conflict in .circleci/scripts/binary_checkout.sh
Auto-merging .circleci/scripts/binary_checkout.sh
CONFLICT (add/add): Merge conflict in .circleci/config.yml
Auto-merging .circleci/config.yml
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/windows_build_definitions.py
Auto-merging .circleci/cimodel/data/windows_build_definitions.py
CONFLICT (add/add): Merge conflict in .azure_pipelines/job_templates/build-verify-publish-template-win.yml
Auto-merging .azure_pipelines/job_templates/build-verify-publish-template-win.yml
CONFLICT (add/add): Merge conflict in .azure_pipelines/job_templates/build-verify-publish-template-unix.yml
Auto-merging .azure_pipelines/job_templates/build-verify-publish-template-unix.yml
Automatic merge failed; fix conflicts and then commit the result.


Exited with code exit status 1


1 failure not recognized by patterns:

Job Step Action
GitHub Actions Test tools / test Install dependencies 🔁 rerun

❄️ 1 failure tentatively classified as flaky

but reruns have not yet been triggered to confirm:

See CircleCI build pytorch_linux_xenial_cuda11_1_cudnn8_py3_gcc7_test2 (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun) ❄️

Jul 15 06:03:12 unknown file: Failure
Jul 15 06:03:12 frame #7: build/bin/test_api() [0xc0b8c5]
Jul 15 06:03:12 frame #8: build/bin/test_api() [0xc0bb65]
Jul 15 06:03:12 frame #9: testing::internal::UnitTestImpl::RunAllTests() + 0xbf9 (0xc0cba9 in build/bin/test_api)
Jul 15 06:03:12 frame #10: testing::UnitTest::Run() + 0x8f (0xc0ce4f in build/bin/test_api)
Jul 15 06:03:12 frame #11: main + 0xc8 (0x5833a8 in build/bin/test_api)
Jul 15 06:03:12 frame #12: __libc_start_main + 0xf0 (0x7f474b9e3840 in /lib/x86_64-linux-gnu/libc.so.6)
Jul 15 06:03:12 frame #13: _start + 0x29 (0x5b9a19 in build/bin/test_api)
Jul 15 06:03:12 " thrown in the test body.
Jul 15 06:03:12 [  FAILED  ] IntegrationTest.MNIST_CUDA (3 ms)
Jul 15 06:03:12 [ RUN      ] IntegrationTest.MNISTBatchNorm_CUDA
Jul 15 06:03:12 unknown file: Failure
Jul 15 06:03:12 C++ exception with description "Error opening images file at test/cpp/api/mnist/train-images-idx3-ubyte
Jul 15 06:03:12 Exception raised from read_images at /var/lib/jenkins/workspace/torch/csrc/api/src/data/datasets/mnist.cpp:67 (most recent call first):
Jul 15 06:03:12 frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x6b (0x7f4765454c6b in /var/lib/jenkins/workspace/build/lib/libc10.so)
Jul 15 06:03:12 frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xce (0x7f476545047e in /var/lib/jenkins/workspace/build/lib/libc10.so)
Jul 15 06:03:12 frame #2: <unknown function> + 0x42015f2 (0x7f4769aa15f2 in /var/lib/jenkins/workspace/build/lib/libtorch_cpu.so)
Jul 15 06:03:12 frame #3: torch::data::datasets::MNIST::MNIST(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, torch::data::datasets::MNIST::Mode) + 0x46 (0x7f4769aa2696 in /var/lib/jenkins/workspace/build/lib/libtorch_cpu.so)
Jul 15 06:03:12 frame #4: IntegrationTest_MNISTBatchNorm_CUDA_Test::TestBody() + 0x9d6 (0x7843b6 in build/bin/test_api)
Jul 15 06:03:12 frame #5: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) + 0x4a (0xc1489a in build/bin/test_api)
Jul 15 06:03:12 frame #6: build/bin/test_api() [0xc0b2d6]
Jul 15 06:03:12 frame #7: build/bin/test_api() [0xc0b8c5]

Preview docs built from this PR

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@facebook-github-bot
Copy link
Contributor

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

1 similar comment
@facebook-github-bot
Copy link
Contributor

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot. torch.chunk is sooooo bad. (cc @mruberry )

@facebook-github-bot
Copy link
Contributor

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@ezyang
Copy link
Contributor

ezyang commented Jul 12, 2021

could I trouble you to fix lint

@mruberry
Copy link
Collaborator

Thank you for adding the "see also" for tensor_split.

@chedatomasz
Copy link
Contributor Author

Fixed the errors shown by lint

@facebook-github-bot
Copy link
Contributor

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@ezyang merged this pull request in 0263865.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants