KEMBAR78
kill SkipInfo by ngimel · Pull Request #64878 · pytorch/pytorch · GitHub
Skip to content

Conversation

@ngimel
Copy link
Collaborator

@ngimel ngimel commented Sep 11, 2021

Per offline discussion, replaces SkipInfo with DecorateInfo. SkipInfo class itself is not removed yet to give functorch time to replace its SkipInfos.
cc @zou3519

@pytorch-probot
Copy link

pytorch-probot bot commented Sep 11, 2021

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/pytorch/pytorch/blob/25497c6a4f8231deb377afa85785bf8d9e44d360/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default

Workflows Labels (bold enabled) Status
Triggered Workflows
linux-bionic-py3.6-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/noarch, ciflow/xla ✅ triggered
linux-bionic-py3.8-gcc9-coverage ciflow/all, ciflow/coverage, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
linux-xenial-cuda11.3-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/default, ciflow/linux ✅ triggered
linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
linux-xenial-py3.6-gcc7-bazel-test ciflow/all, ciflow/bazel, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
win-vs2019-cpu-py3 ciflow/all, ciflow/cpu, ciflow/default, ciflow/win ✅ triggered
win-vs2019-cuda10.2-py3 ciflow/all, ciflow/cuda, ciflow/default, ciflow/win ✅ triggered
Skipped Workflows
libtorch-linux-xenial-cuda10.2-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux 🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux 🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow 🚫 skipped
linux-xenial-cuda10.2-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow 🚫 skipped
parallelnative-linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux 🚫 skipped
paralleltbb-linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux 🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-xenial-cuda11.1-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-win-vs2019-cuda11.1-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win 🚫 skipped
puretorch-linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux 🚫 skipped
win-vs2019-cuda11.3-py3 ciflow/all, ciflow/cuda, ciflow/win 🚫 skipped

You can add a comment to the PR and tag @pytorchbot with the following commands:
# ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun

# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow

For more information, please take a look at the CI Flow Wiki.

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Sep 11, 2021

🔗 Helpful links

💊 CI failures summary and remediations

As of commit 25497c6 (more details on the Dr. CI page):


  • 2/2 failures possibly* introduced in this PR
    • 1/2 non-scanned failure(s)

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See GitHub Actions build linux-xenial-cuda11.3-py3.6-gcc7 / test (distributed, 1, 1, linux.8xlarge.nvidia.gpu) (1/1)

Step: "Unknown" (full log | diagnosis details | 🔁 rerun)

2021-09-13T20:13:24.7938857Z RuntimeError: Expe...e, but found at least two devices, cuda:0 and cpu!
2021-09-13T20:13:24.7842414Z     return x.cpu() + y.cuda()
2021-09-13T20:13:24.7843656Z RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
2021-09-13T20:13:24.7844280Z 
2021-09-13T20:13:24.7931644Z On WorkerInfo(id=2, name=worker2):
2021-09-13T20:13:24.7932845Z RuntimeError('Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!',)
2021-09-13T20:13:24.7933641Z Traceback (most recent call last):
2021-09-13T20:13:24.7934663Z   File "/opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py", line 204, in _run_function
2021-09-13T20:13:24.7935613Z     result = python_udf.func(*python_udf.args, **python_udf.kwargs)
2021-09-13T20:13:24.7936801Z   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/distributed/rpc/rpc_test.py", line 5879, in _gpu_add_wrong_gpus
2021-09-13T20:13:24.7937959Z     return x.cpu() + y.cuda()
2021-09-13T20:13:24.7938857Z RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
2021-09-13T20:13:24.7939998Z 
2021-09-13T20:13:25.3496262Z ok (6.226s)
2021-09-13T20:13:26.9671614Z   test_devices_option_mismatch (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... ok (1.617s)
2021-09-13T20:13:28.5845393Z   test_devices_option_mismatch_reverse (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... ok (1.617s)
2021-09-13T20:13:35.9134931Z   test_owner_rref_forward_synchronization1 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... ok (7.329s)
2021-09-13T20:13:45.2436487Z   test_owner_rref_forward_synchronization2 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... ok (9.330s)
2021-09-13T20:13:54.1729329Z   test_owner_rref_forward_synchronization3 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... ok (8.929s)
2021-09-13T20:14:01.0991544Z   test_owner_rref_forward_synchronization4 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... ok (6.926s)
2021-09-13T20:14:16.8395611Z   test_rref_as_arg_synchronization1 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... ok (15.740s)
2021-09-13T20:14:36.6888167Z   test_rref_as_arg_synchronization2 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... ok (19.849s)

ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@ngimel ngimel requested a review from mruberry September 11, 2021 03:58
@codecov
Copy link

codecov bot commented Sep 11, 2021

Codecov Report

Merging #64878 (25497c6) into master (2ae938e) will decrease coverage by 0.08%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #64878      +/-   ##
==========================================
- Coverage   66.69%   66.60%   -0.09%     
==========================================
  Files         718      718              
  Lines       92693    92704      +11     
==========================================
- Hits        61822    61749      -73     
- Misses      30871    30955      +84     

@mruberry
Copy link
Collaborator

This seems like a fine simplification and solution to our expected failure vs skip vs decorator debate, cc @heitorschueroff. I would suggest we rename "skips" to "skips_and_efails" or something similar if we're going to keep it so people know it's the place to put both skips and expected failures (for readability).

flake is complaining, however. Also, shouldn't this remove the actual SkipInfo class?

@heitorschueroff
Copy link
Contributor

If we are killing SkipInfo then I suggest we remove skips as well, though this change will take more time and work than just find and replace so we could also postpone it for the eventual refactoring of common_methods_invocations.

Natalia Gimelshein added 2 commits September 13, 2021 11:34
@ngimel ngimel force-pushed the ngimel/kill_skipInfo branch from e612e52 to 4558fcb Compare September 13, 2021 18:46
@pytorch-probot pytorch-probot bot assigned pytorchbot and unassigned pytorchbot Sep 13, 2021
@pytorch-probot pytorch-probot bot assigned pytorchbot and unassigned pytorchbot Sep 13, 2021
@facebook-github-bot
Copy link
Contributor

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@zou3519
Copy link
Contributor

zou3519 commented Sep 13, 2021

We've replaced SkipInfo in functorch with DecorateInfo: pytorch/functorch@d535904

@facebook-github-bot
Copy link
Contributor

@ngimel merged this pull request in 1e25a84.

@ngimel
Copy link
Collaborator Author

ngimel commented Sep 14, 2021

If we are killing SkipInfo then I suggest we remove skips as well, though this change will take more time and work than just find and replace so we could also postpone it for the eventual refactoring of common_methods_invocations.

@heitorschueroff you are absolutely right, and we should replace skips with expected fails whenever possible, like your PR was doing, but it's also a more complicated process than just search and replace.

@ngimel ngimel deleted the ngimel/kill_skipInfo branch December 26, 2021 06:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants