KEMBAR78
add `OpInfo` for `torch.nn.functional.mse_loss` by pmeier · Pull Request #62254 · pytorch/pytorch · GitHub
Skip to content

Conversation

@pmeier
Copy link
Collaborator

@pmeier pmeier commented Jul 27, 2021

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Jul 27, 2021

🔗 Helpful links

💊 CI failures summary and remediations

As of commit 4c62164 (more details on the Dr. CI page):


  • 1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_macos_10_13_py3_test (1/1)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

Aug 02 18:50:28 [E request_callback_no_python.c...yUniqueId(created_on=0, local_id=0) to be created.
Aug 02 18:50:17 ok (8.853s)
Aug 02 18:50:19   test_remote_message_script_delay_timeout (__main__.FaultyFaultyAgentRpcTestWithSpawn) ... [W tensorpipe_agent.cpp:186] Failed to look up the IP address for the hostname (EAI_NONAME: unknown node or service (this error originated at tensorpipe/transport/uv/utility.cc:97)), defaulting to 127.0.0.1
Aug 02 18:50:19 [W tensorpipe_agent.cpp:186] Failed to look up the IP address for the hostname (EAI_NONAME: unknown node or service (this error originated at tensorpipe/transport/uv/utility.cc:97)), defaulting to 127.0.0.1
Aug 02 18:50:19 [W tensorpipe_agent.cpp:186] Failed to look up the IP address for the hostname (EAI_NONAME: unknown node or service (this error originated at tensorpipe/transport/uv/utility.cc:97)), defaulting to 127.0.0.1
Aug 02 18:50:19 [W tensorpipe_agent.cpp:186] Failed to look up the IP address for the hostname (EAI_NONAME: unknown node or service (this error originated at tensorpipe/transport/uv/utility.cc:97)), defaulting to 127.0.0.1
Aug 02 18:50:24 ok (7.376s)
Aug 02 18:50:26   test_remote_message_script_delay_timeout_to_self (__main__.FaultyFaultyAgentRpcTestWithSpawn) ... [W tensorpipe_agent.cpp:186] Failed to look up the IP address for the hostname (EAI_NONAME: unknown node or service (this error originated at tensorpipe/transport/uv/utility.cc:97)), defaulting to 127.0.0.1
Aug 02 18:50:26 [W tensorpipe_agent.cpp:186] Failed to look up the IP address for the hostname (EAI_NONAME: unknown node or service (this error originated at tensorpipe/transport/uv/utility.cc:97)), defaulting to 127.0.0.1
Aug 02 18:50:26 [W tensorpipe_agent.cpp:186] Failed to look up the IP address for the hostname (EAI_NONAME: unknown node or service (this error originated at tensorpipe/transport/uv/utility.cc:97)), defaulting to 127.0.0.1
Aug 02 18:50:26 [W tensorpipe_agent.cpp:186] Failed to look up the IP address for the hostname (EAI_NONAME: unknown node or service (this error originated at tensorpipe/transport/uv/utility.cc:97)), defaulting to 127.0.0.1
Aug 02 18:50:28 [E request_callback_no_python.cpp:555] Received error while processing request type 260: falseINTERNAL ASSERT FAILED at "../torch/csrc/distributed/rpc/rref_context.cpp":387, please report a bug to PyTorch. Expected OwnerRRef with id GloballyUniqueId(created_on=0, local_id=0) to be created.
Aug 02 18:50:28 Exception raised from getOwnerRRef at ../torch/csrc/distributed/rpc/rref_context.cpp:387 (most recent call first):
Aug 02 18:50:28 frame #0: c10::Error::Error(c10::SourceLocation, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >) + 98 (0x10bf30652 in libc10.dylib)
Aug 02 18:50:28 frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) + 106 (0x10bf2edca in libc10.dylib)
Aug 02 18:50:28 frame #2: c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) + 64 (0x10bf2f000 in libc10.dylib)
Aug 02 18:50:28 frame #3: torch::distributed::rpc::RRefContext::getOwnerRRef(torch::distributed::rpc::GloballyUniqueId const&, bool) + 1711 (0x11b2675ff in libtorch_cpu.dylib)
Aug 02 18:50:28 frame #4: torch::distributed::rpc::RequestCallbackNoPython::assignOwnerRRef(torch::distributed::rpc::GloballyUniqueId const&, torch::distributed::rpc::GloballyUniqueId const&, c10::intrusive_ptr<c10::ivalue::Future, c10::detail::intrusive_target_default_null_type<c10::ivalue::Future> >) const + 86 (0x11b251e56 in libtorch_cpu.dylib)
Aug 02 18:50:28 frame #5: torch::distributed::rpc::RequestCallbackImpl::processScriptRemoteCall(torch::distributed::rpc::RpcCommandBase&, std::__1::vector<c10::Stream, std::__1::allocator<c10::Stream> >) const + 376 (0x1171e6768 in libtorch_python.dylib)
Aug 02 18:50:28 frame #6: torch::distributed::rpc::RequestCallbackNoPython::processRpc(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::__1::vector<c10::Stream, std::__1::allocator<c10::Stream> >) const + 437 (0x11b250aa5 in libtorch_cpu.dylib)
Aug 02 18:50:28 frame #7: torch::distributed::rpc::RequestCallbackImpl::processRpcWithErrors(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::__1::vector<c10::Stream, std::__1::allocator<c10::Stream> >) const + 74 (0x1171e74da in libtorch_python.dylib)
Aug 02 18:50:28 frame #8: c10::intrusive_ptr<c10::ivalue::Future, c10::detail::intrusive_target_default_null_type<c10::ivalue::Future> > c10::ivalue::Future::thenAsync<torch::distributed::rpc::RequestCallbackNoPython::processMessage(torch::distributed::rpc::Message&, std::__1::vector<c10::Stream, std::__1::allocator<c10::Stream> >) const::$_1>(torch::distributed::rpc::RequestCallbackNoPython::processMessage(torch::distributed::rpc::Message&, std::__1::vector<c10::Stream, std::__1::allocator<c10::Stream> >) const::$_1, std::__1::shared_ptr<c10::Type>)::'lambda'(c10::ivalue::Future&)::operator()(c10::ivalue::Future&) + 223 (0x11b25876f in libtorch_cpu.dylib)

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@pmeier pmeier requested review from mruberry and zou3519 July 27, 2021 09:52
@pmeier
Copy link
Collaborator Author

pmeier commented Jul 27, 2021

Not sure about this failure: https://app.circleci.com/pipelines/github/pytorch/pytorch/356512/workflows/199959e4-7e76-4245-972e-88fa1de4872d/jobs/15029588/tests#failed-test-0

It seems a lot of other OpInfo's seem to skip it, so I don't know if it is warranted to also do it here.

@zou3519
Copy link
Contributor

zou3519 commented Jul 27, 2021

Test failures look real:

======================================================================
ERROR [0.070s]: test_variant_consistency_jit_nn_functional_mse_loss_cpu_float32 (__main__.TestJitCPU)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 378, in instantiated_test
    raise rte
  File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 373, in instantiated_test
    result = test(self, **param_kwargs)
  File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 780, in test_wrapper
    return test(*args, **kwargs)
  File "test_ops.py", line 731, in test_variant_consistency_jit
    func_type=func_type, aten_name=op.aten_name)
  File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/jit_metaprogramming_utils.py", line 490, in check_alias_annotation
    torch._C._jit_check_alias_annotation(CU.the_method.graph, tuple(tensors), aten_name)
RuntimeError: aliasOp != torch::jit::getOperatorAliasMap().end()INTERNAL ASSERT FAILED at "/var/lib/jenkins/workspace/torch/csrc/jit/passes/utils/check_alias_annotation.cpp":159, please report a bug to PyTorch. 

I'm not sure what the right way to handle this is. @mruberry, @eellison -- is adding a Skip and then filing an issue the way to go?

@ejguan ejguan added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jul 27, 2021
@pmeier pmeier added module: nn Related to torch.nn module: testing Issues related to the torch.testing module (not tests) labels Jul 28, 2021
@mruberry
Copy link
Collaborator

Test failures look real:

======================================================================
ERROR [0.070s]: test_variant_consistency_jit_nn_functional_mse_loss_cpu_float32 (__main__.TestJitCPU)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 378, in instantiated_test
    raise rte
  File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 373, in instantiated_test
    result = test(self, **param_kwargs)
  File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 780, in test_wrapper
    return test(*args, **kwargs)
  File "test_ops.py", line 731, in test_variant_consistency_jit
    func_type=func_type, aten_name=op.aten_name)
  File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/jit_metaprogramming_utils.py", line 490, in check_alias_annotation
    torch._C._jit_check_alias_annotation(CU.the_method.graph, tuple(tensors), aten_name)
RuntimeError: aliasOp != torch::jit::getOperatorAliasMap().end()INTERNAL ASSERT FAILED at "/var/lib/jenkins/workspace/torch/csrc/jit/passes/utils/check_alias_annotation.cpp":159, please report a bug to PyTorch. 

I'm not sure what the right way to handle this is. @mruberry, @eellison -- is adding a Skip and then filing an issue the way to go?

Skipping seems OK for now. My guess is there's an issue with the test. I wouldn't bother filing an issue unless @eellison would like one.

@mruberry
Copy link
Collaborator

mypy errors are unrelated; I would just skip the test and if you could tweak the sample inputs generator that'd be great -- large sample inputs can take a long time to test.

@zou3519 would you shepherd this through?

@facebook-github-bot
Copy link
Contributor

@zou3519 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@zou3519 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@zou3519 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@zou3519 merged this pull request in 2cf4d81.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed Merged module: nn Related to torch.nn module: testing Issues related to the torch.testing module (not tests) open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants