KEMBAR78
Migrate `torch.lstsq` to ATen by peterbell10 · Pull Request #59400 · pytorch/pytorch · GitHub
Skip to content

Conversation

@peterbell10
Copy link
Collaborator

@peterbell10 peterbell10 commented Jun 3, 2021

Closes #24726, closes #44011

This builds on the port from #44011. I've rebased on master and addressed @mruberry's comments. There were also some unnecessary copies of B taking place that I've cleaned up. This function is already deprecated, but since it's the last lapack routine in TH, it's still worth porting.

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Jun 3, 2021

💊 CI failures summary and remediations

As of commit 7783fa0 (more details on the Dr. CI page):


  • 2/2 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_bionic_py3_6_clang9_noarch_test (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Jun 03 21:57:10 test_udf_remote_message_delay...yUniqueId(created_on=0, local_id=0) to be created.
Jun 03 21:56:27 frame #13: c10::ThreadPool::main_loop(unsigned long) + 0x17a (0x7f33bb8b1e4a in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
Jun 03 21:56:27 frame #14: <unknown function> + 0xc819d (0x7f33bb7c719d in /opt/conda/lib/libstdc++.so.6)
Jun 03 21:56:27 frame #15: <unknown function> + 0x76db (0x7f33d93f16db in /lib/x86_64-linux-gnu/libpthread.so.0)
Jun 03 21:56:27 frame #16: clone + 0x3f (0x7f33d911a71f in /lib/x86_64-linux-gnu/libc.so.6)
Jun 03 21:56:27 
Jun 03 21:56:27 ok (4.256s)
Jun 03 21:56:43   test_rpc_builtin_timeout (__main__.FaultyFaultyAgentRpcTestWithSpawn) ... ok (15.682s)
Jun 03 21:56:52   test_rpc_script_timeout (__main__.FaultyFaultyAgentRpcTestWithSpawn) ... ok (9.767s)
Jun 03 21:56:57   test_rref_to_here_timeout (__main__.FaultyFaultyAgentRpcTestWithSpawn) ... ok (4.253s)
Jun 03 21:57:05   test_udf_remote_message_delay_timeout (__main__.FaultyFaultyAgentRpcTestWithSpawn) ... ok (8.264s)
Jun 03 21:57:10   test_udf_remote_message_delay_timeout_to_self (__main__.FaultyFaultyAgentRpcTestWithSpawn) ... [E request_callback_no_python.cpp:569] Received error while processing request type 261: falseINTERNAL ASSERT FAILED at "/var/lib/jenkins/workspace/torch/csrc/distributed/rpc/rref_context.cpp":387, please report a bug to PyTorch. Expected OwnerRRef with id GloballyUniqueId(created_on=0, local_id=0) to be created.
Jun 03 21:57:10 Exception raised from getOwnerRRef at /var/lib/jenkins/workspace/torch/csrc/distributed/rpc/rref_context.cpp:387 (most recent call first):
Jun 03 21:57:10 frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x7d (0x7fac61c4407d in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
Jun 03 21:57:10 frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xde (0x7fac61c4279e in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
Jun 03 21:57:10 frame #2: c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x3b (0x7fac61c429eb in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
Jun 03 21:57:10 frame #3: torch::distributed::rpc::RRefContext::getOwnerRRef(torch::distributed::rpc::GloballyUniqueId const&, bool) + 0x664 (0x7fac66395f34 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
Jun 03 21:57:10 frame #4: torch::distributed::rpc::RequestCallbackNoPython::assignOwnerRRef(torch::distributed::rpc::GloballyUniqueId const&, torch::distributed::rpc::GloballyUniqueId const&, c10::intrusive_ptr<c10::ivalue::Future, c10::detail::intrusive_target_default_null_type<c10::ivalue::Future> >, std::shared_ptr<torch::distributed::rpc::LazyStreamContext>) const + 0x5b (0x7fac6637c43b in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
Jun 03 21:57:10 frame #5: torch::distributed::rpc::RequestCallbackImpl::processPythonRemoteCall(torch::distributed::rpc::RpcCommandBase&, std::shared_ptr<torch::distributed::rpc::LazyStreamContext>) const + 0x12e (0x7fac6f6452ee in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
Jun 03 21:57:10 frame #6: torch::distributed::rpc::RequestCallbackNoPython::processRpc(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::shared_ptr<torch::distributed::rpc::LazyStreamContext>) const + 0x227 (0x7fac6637ab67 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
Jun 03 21:57:10 frame #7: torch::distributed::rpc::RequestCallbackImpl::processRpcWithErrors(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::shared_ptr<torch::distributed::rpc::LazyStreamContext>) const + 0x39 (0x7fac6f646519 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
Jun 03 21:57:10 frame #8: <unknown function> + 0x450bef8 (0x7fac66383ef8 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)

1 failure not recognized by patterns:

Job Step Action
GitHub Actions Linux CI (pytorch-linux-xenial-py3.6-gcc5.4) / render_test_results Unknown 🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@peterbell10 peterbell10 force-pushed the lstsq-aten branch 2 times, most recently from 05a4eb1 to a1ab454 Compare June 3, 2021 19:03
@mruberry mruberry requested a review from ngimel June 4, 2021 13:31
@mruberry mruberry added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jun 4, 2021
@ngimel
Copy link
Collaborator

ngimel commented Jun 4, 2021

@mruberry are we deprecating lstsq, or porting it?

@mruberry
Copy link
Collaborator

mruberry commented Jun 6, 2021

@mruberry are we deprecating lstsq, or porting it?

It's already deprecated and we plan to remove it. There are a few internal uses, however, that we have to be sure not to break.

@ngimel
Copy link
Collaborator

ngimel commented Jun 6, 2021

Ok so I take it we still need to port it then.

@facebook-github-bot
Copy link
Contributor

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@ngimel merged this pull request in 390fe74.

deniskokarev pushed a commit to deniskokarev/pytorch that referenced this pull request Jun 9, 2021
Summary:
Closes  pytorch#24726, closes pytorch#44011

This builds on the port from pytorch#44011. I've rebased on master and addressed mruberry's comments. There were also some unnecessary copies of `B` taking place that I've cleaned up. This function is already deprecated, but since it's the last lapack routine in TH, it's still worth porting.

Pull Request resolved: pytorch#59400

Reviewed By: mruberry

Differential Revision: D28922060

Pulled By: ngimel

fbshipit-source-id: cfd7ec8b50d2ab886f0e04a2a557e4e410ee8184
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed Merged open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Migrate lstsq from the TH to Aten (CPU)

6 participants