-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Added support for Constants according to Python Array API stipulation #59910
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi @praneethratna! Thank you for your pull request and welcome to our community. Action RequiredIn order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks! |
💊 CI failures summary and remediationsAs of commit 1b2c54b (more details on the Dr. CI page and at hud.pytorch.org/pr/59910): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group. |
|
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @praneethratna and thanks for the PR! I have one comment. Otherwise LGTM.
Summary: Pull Request resolved: pytorch#59709 Fixes pytorch#59705. Python 3.8 fixed tracebacks to report the beginning of the line that raised an error, rather than the end. This makes for a simpler implementation (no more string reversing) but need to actually implement. This wasn't caught by tests because we hard coded line numbers to do substitutions, so I also added a little smoketest to detect future changes to traceback line number behavior. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D28994919 Pulled By: ezyang fbshipit-source-id: 1fb0a782e17c55c13d668fabd04766d2b3811962
Summary: Pull Request resolved: pytorch#59758 The underlying call to tp_getattr is const safe but CPython has not fixed it due to BC problems. No reason not to advertise the better type here though! Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29017911 Pulled By: ezyang fbshipit-source-id: 8d55983fe6416c03eb69c6367bcc431c30000133
…h#59195) Summary: Currently, if we do softmax which are not along the last dim, the calculation will fall to a [scalar version](https://github.com/pytorch/pytorch/blob/d417a094f398f1c4efd7f818b14b8471a597fbcc/aten/src/ATen/native/SoftMax.cpp#L14-L64). And we find actually we have the chance to vectorize the calculation along the inner_size dim. Changes we made: - Use vectorized softmax_kernel instead of host_softmax when not along the last dim. Performance data on 28 cores' Intel 8280 CPU when the Input size is [32, 81, 15130] and do softmax along the second dim(81). - FP32 Baseline: 24.67 ms - FP32 optimized: 9.2 ms Pull Request resolved: pytorch#59195 Reviewed By: ailzhang Differential Revision: D28854796 Pulled By: cpuhrsch fbshipit-source-id: 18477acc3963754c59009b1794f080496ae16c3d
Summary: Some minor quality of life improvements for the NNC python bindings: - expose `call_raw()` - support passing integers to `call()` (for dynamic shapes) - implicit conversions to cleanup `[BufferArg(x) for x in [A, B, C]]` into just `[A, B, C]` - don't silently default to "ir_eval" for unknown mode (e.g. "LLVM") Pull Request resolved: pytorch#59920 Reviewed By: ZolotukhinM Differential Revision: D29090904 Pulled By: jansel fbshipit-source-id: 154ace82725ae2046cfe2e6eb324fd37f5d209a7
…error (pytorch#59918) Summary: Pull Request resolved: pytorch#59918 Reland of pytorch#59684 ghstack-source-id: 131303057 Test Plan: ci Reviewed By: cbalioglu Differential Revision: D29081452 fbshipit-source-id: 419df79341f702e796f7adf5f1071a6cd1dcd8d1
Summary: Pull Request resolved: pytorch#59912 Reviewed By: soulitzer Differential Revision: D29100518 Pulled By: albanD fbshipit-source-id: b86a4aa9050e4fa70a0872c1d8799e5953cd2bc8
…ist; python test added to verify the test (pytorch#57574) Summary: Pull Request resolved: pytorch#57574 Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D29038774 Pulled By: Krovatkin fbshipit-source-id: cb342c1b04fa3713a8166b39213437bc9f2d8606
Summary: Pull Request resolved: pytorch#57575 This PR does two things: 1. reverts "Manual revert of D27369251 (pytorch@f88a3ff) (pytorch#56080)" in commit 92a09fb. 2. fixing DifferentiableGraph output with wrong requires_grad flag Fixing requires_grad on outputs from DifferentiableGraph, the proper flag is retrieved from profiling information. We previously only retrieves the profiling information on the first profile node in all its uses. However, in case where control flows are present, we need to iteratively search for profile node with profiling information available, in case the first use is in an inactive code path. e.g. ``` graph(%0 : Tensor, %1 : Bool): ..., %2 : Tensor = prim::DifferentiableGraph_0(%0) %3 : Tensor = prim::If(%1) block0(): %4 : Tensor = prim::DifferentiableGraph_1(%2) -> (%4) block1(): %5 : Tensor = prim::DifferentiableGraph_2(%2) -> (%5) -> (%3) with prim::DifferentiableGraph_0 = graph(%0 : Tensor): ... %out : Tensor = aten::operation(...) ... return (..., %out) with prim::DifferentiableGraph_1 = graph(%0 : Tensor): %temp : Tensor = prim::profile[profiled_type=Tensor](%0) ... with prim::DifferentiableGraph_2 = graph(%0 : Tensor): %temp : Tensor = prim::profile[profiled_type=Float(...)](%0) ... ``` Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D29038773 Pulled By: Krovatkin fbshipit-source-id: 6c0a851119f6b8f2f1afae5c74532407aae238fe
Summary: Gives an error message (rather than a segfault) if you forget `KernelScope()`. Pull Request resolved: pytorch#59922 Reviewed By: bertmaher Differential Revision: D29091303 Pulled By: jansel fbshipit-source-id: a24ee2385cae1f210b0cbc3f8860948fc052b655
Summary: Makes possible that the first register parametrization depends on a number of parameters rather than just one. Examples of these types of parametrizations are `torch.nn.utils.weight_norm` and low rank parametrizations via the multiplication of a `n x k` tensor by a `k x m` tensor with `k <= m, n`. Follows the plan outlined in pytorch#33344 (comment). A short summary of the idea is: we call `right_inverse` when registering a parametrization to generate the tensors that we are going to save. If `right_inverse` returns a sequence of tensors, then we save them as `original0`, `original1`... If it returns a `Tensor` or a sequence of length 1, we save it as `original`. We only allow to have many-to-one parametrizations in the first parametrization registered. The next parametrizations would need to be one-to-one. There were a number of choices in the implementation: If the `right_inverse` returns a sequence of parameters, then we unpack it in the forward. This is to allow to write code as: ```python class Sum(nn.Module): def forward(self, X, Y): return X + Y def right_inverse(Z): return Z, torch.zeros_like(Z) ``` rather than having to unpack manually a list or a tuple within the `forward` function. At the moment the errors are a bit all over the place. This is to avoid having to check some properties of `forward` and `right_inverse` when they are registered. I left this like this for now, but I believe it'd be better to call these functions when they are registered to make sure the invariants hold and throw errors as soon as possible. The invariants are the following: 1. The following code should be well-formed ```python X = module.weight Y = param.right_inverse(X) assert isinstance(Y, Tensor) or isinstance(Y, collections.Sequence) Z = param(Y) if isisntance(Y, Tensor) else param(*Y) ``` in other words, if `Y` is a `Sequence` of `Tensor`s (we check also that the elements of the sequence are Tensors), then it is of the same length as the number parameters `param.forward` accepts. 2. Always: `X.dtype == Z.dtype and X.shape == Z.shape`. This is to protect the user from shooting themselves in the foot, as it's too odd for a parametrization to change the metadata of a tensor. 3. If it's one-to-one: `X.dtype == Y.dtype`. This is to be able to do `X.set_(Y)` so that if a user first instantiates the optimiser and then puts the parametrisation, then we reuse `X` and the user does not need to add a new parameter to the optimiser. Alas, this is not possible when the parametrisation is many-to-one. The current implementation of `spectral_norm` and `weight_norm` does not seem to care about this, so this would not be a regression. I left a warning in the documentation though, as this case is a bit tricky. I'm still missing to go over the formatting of the documentation, I'll do that tomorrow. Pull Request resolved: pytorch#58488 Reviewed By: soulitzer Differential Revision: D29100708 Pulled By: albanD fbshipit-source-id: b9e91f439cf6b5b54d5fa210ec97c889efb9da38
Summary: This PR is to upgrade onednn to v2.2.3 (including v2.2 and v2.2.3 changes) which has the following main changes about CPU: v2.2 changes: Improved performance of compute functionality for future Intel Core processor with Intel AVX2 and Intel DL Boost instructions support (code name Alder Lake). Improved fp32 inner product forward propagation performance for processors with Intel AVX-512 support. Improved dnnl_gemm performance for cases with n=1 on all supported processors. v2.2.3 changes: Fixed a bug in int8 depthwise convolution ptimitive with groups and 1d spatial size for processors with Intel AVX-512 and Intel AVX2 support Fixed correctness issue for PReLU primitive on Intel Processor Graphics Fixed corretness issue in reorder for blocked layouts with zero padding Improved performance of weights reorders used by BRGEMM-based convolution primitive for processors with Intel AVX-512 support More changes can be found in https://github.com/oneapi-src/oneDNN/releases. Ideep used version is pytorch-rls-v2.2.3. OneDNN used version is v2.2.3. Pull Request resolved: pytorch#57928 Reviewed By: bdhirsh Differential Revision: D29037857 Pulled By: VitalyFedyunin fbshipit-source-id: db74534858bdcf5d6c7dcf58e224fc756188bc31
Summary: Pull Request resolved: pytorch#59959 **Summary** This commit replaces the warning on the `torch.package` documentation page about the module not being publicly released (which will no longer be true as of 1.9) with one that warns about security issues caused by the use of the `pickle` module. **Test Plan** 1) Built the docs locally. 2) Continuous integration. <img width="877" alt="Captura de Pantalla 2021-06-14 a la(s) 11 22 05 a m" src="https://user-images.githubusercontent.com/4392003/121940300-c98cab00-cd02-11eb-99dc-08e29632079a.png"> Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D29108429 Pulled By: SplitInfinity fbshipit-source-id: 3a0aeac0dc804a31203bc5071efb1c5bd6ef9725
Summary: Previous is pytorch#57781 We add now two CUDA bindings to avoid using ctypes to fix a windows issue. However, we use ctypes to allocate the stream and create its pointer (we can do this with a 0-dim tensor too if it feels better). CC. ezyang rgommers ngimel mruberry Pull Request resolved: pytorch#59527 Reviewed By: albanD Differential Revision: D29053062 Pulled By: ezyang fbshipit-source-id: 661e7e58de98b1bdb7a0871808cd41d91fe8f13f
Summary: Fixes pytorch#3025 ## Background This PR implements a function similar to numpy's [`isin()`](https://numpy.org/doc/stable/reference/generated/numpy.isin.html#numpy.isin). The op supports integral and floating point types on CPU and CUDA (+ half & bfloat16 for CUDA). Inputs can be one of: * (Tensor, Tensor) * (Tensor, Scalar) * (Scalar, Tensor) Internally, one of two algorithms is selected based on the number of elements vs. test elements. The heuristic for deciding which algorithm to use is taken from [numpy's implementation](https://github.com/numpy/numpy/blob/fb215c76967739268de71aa4bda55dd1b062bc2e/numpy/lib/arraysetops.py#L575): if `len(test_elements) < 10 * len(elements) ** 0.145`, then a naive brute-force checking algorithm is used. Otherwise, a stablesort-based algorithm is used. I've done some preliminary benchmarking to verify this heuristic on a devgpu, and determined for a limited set of tests that a power value of `0.407` instead of `0.145` is a better inflection point. For now, the heuristic has been left to match numpy's, but input is welcome for the best way to select it or whether it should be left the same as numpy's. Tests are adapted from numpy's [isin and in1d tests](https://github.com/numpy/numpy/blob/7dcd29aaafe1ab8be4be04d3c793e5bcaf17459f/numpy/lib/tests/test_arraysetops.py). Note: my locally generated docs look terrible for some reason, so I'm not including the screenshot for them until I figure out why. Pull Request resolved: pytorch#53125 Test Plan: ``` python test/test_ops.py # Ex: python test/test_ops.py TestOpInfoCPU.test_supported_dtypes_isin_cpu_int32 python test/test_sort_and_select.py # Ex: python test/test_sort_and_select.py TestSortAndSelectCPU.test_isin_cpu_int32 ``` Reviewed By: soulitzer Differential Revision: D29101165 Pulled By: jbschlosser fbshipit-source-id: 2dcc38d497b1e843f73f332d837081e819454b4e
…ch#59948) Summary: Pull Request resolved: pytorch#59948 1. We have two Interpreters. One for vanilla op and one for acc op. Some of the logic between them are similar and in this diff we extract out the similar logic to a Base Interpreter. This makes any future general feature change could benefit both Interpreters. 2. Make TRT Interpreter not depending on concrete tensor arg. We will use `InputTensorSpec` to create necessary inputs for acc tracer. 3. Add unittests for acc op converter. Test Plan: ``` buck test mode/opt caffe2/torch/fb/fx2trt:test_linear buck test mode/opt caffe2/torch/fb/fx2trt:test_batchnorm buck test mode/opt caffe2/torch/fb/fx2trt:test_convolution buck test mode/opt caffe2/torch/fb/fx2trt:test_reshape buck test mode/opt caffe2/torch/fb/fx2trt:test_relu buck test mode/opt caffe2/torch/fb/fx2trt:test_add buck test mode/opt caffe2/torch/fb/fx2trt:test_maxpool ``` Reviewed By: jackm321 Differential Revision: D28749682 fbshipit-source-id: 830d845aede7203f6e56eb1c4e6776af197a0fc3
… inputs Test Plan: revert-hammer Differential Revision: D29100708 (pytorch@061e71b) Original commit changeset: b9e91f439cf6 fbshipit-source-id: bff6d8a3d7b24f4beb976383912033c250d91a53
Summary: Pull Request resolved: pytorch#59840 moving these tests to their own standalone file. No meaningful code changes. ghstack-source-id: 131359162 Test Plan: CI Reviewed By: cbalioglu Differential Revision: D29012664 fbshipit-source-id: 348870016509a6ed7e69240fa82bccef4a12d674
Fixes #58739 by adding the support for Constants (e, pi , nan , inf) according to Array API stipulations