KEMBAR78
From NumPy to PyTorch | PDF
From NumPy to PyTorch
Mike Ruberry
software engineer @ Facebook
Outline
- NumPy and working with tensors
- PyTorch and hardware accelerators, autograd, and computational graphs
- Adding NumPy operators to Pytorch
- When PyTorch is Different from NumPy
- Lessons learned and future work
NumPy and working
with tensors
1 >> import numpy as np
2 >> a = np.array(((1, 2), (3, 4)))
array([[1, 2],
[3, 4]])
3 >> b = np.array(((-1, -2), (-3, -4)))
4 >> np.add(a, b)
array([[0, 0],
[0, 0]])
5 >> np.matmul(a, b)
array([[ -7, -10],
[-15, -22]])
Simple NumPy Snippets
1 >> import numpy as np
2 >> a = np.array(((1, 2), (3, 4)))
array([[1, 2],
[3, 4]])
3 >> b = np.array(((-1, -2), (-3, -4)))
4 >> np.add(a, b)
array([[0, 0],
[0, 0]])
5 >> np.matmul(a, b)
array([[ -7, -10],
[-15, -22]])
Simple NumPy Snippets
Tensor creation
1 >> import numpy as np
2 >> a = np.array(((1, 2), (3, 4)))
array([[1, 2],
[3, 4]])
3 >> b = np.array(((-1, -2), (-3, -4)))
4 >> np.add(a, b)
array([[0, 0],
[0, 0]])
5 >> np.matmul(a, b)
array([[ -7, -10],
[-15, -22]])
Simple NumPy Snippets
Addition
1 >> import numpy as np
2 >> a = np.array(((1, 2), (3, 4)))
array([[1, 2],
[3, 4]])
3 >> b = np.array(((-1, -2), (-3, -4)))
4 >> np.add(a, b)
array([[0, 0],
[0, 0]])
5 >> np.matmul(a, b)
array([[ -7, -10],
[-15, -22]])
Simple NumPy Snippets
Matrix multiplication
1 >> np.fft.fft(np.exp(2j * np.pi * np.arange(8) / 8))
array([-3.44509285e-16 +1.14423775e-17 j,
8.00000000e+00 -8.11483250e-16 j,
2.33486982e-16 +1.22464680e-16 j,
0.00000000e+00 +1.22464680e-16 j,
9.95799250e-17 +2.33486982e-16 j,
0.00000000e+00 +7.66951701e-17 j,
1.14423775e-17 +1.22464680e-16 j,
0.00000000e+00 +1.22464680e-16 j])
2 >> A = np.array([[1,-2j],[2j,5]])
3 >> np.linalg.cholesky(A)
array([[1.+0.j, 0.+0.j],
[0.+2.j, 1.+0.j]])
More Complicated NumPy Snippets
NumPy
Operators
Composites Primitives
Composites Primitives
1 def sinc(x):
2 x = np.asanyarray(x)
3 y = pi * where(x == 0, 1.0e-20, x)
4 return sin(y)/y
1 double npy_copysign(
double x,
double y)
2 {
3 npy_uint32 hx , hy;
4 GET_HIGH_WORD(hx, x);
5 GET_HIGH_WORD(hy, y);
6 SET_HIGH_WORD(x,
(hx & 0x7fffffff) |
(hy & 0x80000000));
7 return x;
8 }
PyTorch and
hardware accelerators,
autograd, and computational
graphs
1 >> import numpy as np
2 >> a = np.array(((1, 2), (3, 4)))
array([[1, 2],
[3, 4]])
3 >> b = np.array(((-1, -2), (-3, -4)))
4 >> np.add(a, b)
array([[0, 0],
[0, 0]])
5 >> np.matmul(a, b)
array([[ -7, -10],
[-15, -22]])
Simple NumPy Snippets (Again)
1 >> import torch
2 >> a = torch.tensor(((1, 2), (3, 4)))
tensor([[1, 2],
[3, 4]])
3 >> b = np.array(((-1, -2), (-3, -4)))
4 >> np.add(a, b)
array([[0, 0],
[0, 0]])
5 >> np.matmul(a, b)
array([[ -7, -10],
[-15, -22]])
Simple NumPy Snippets to PyTorch Snippets
Tensor creation
1 >> import torch
2 >> a = torch.tensor(((1, 2), (3, 4)))
tensor([[1, 2],
[3, 4]])
3 >> b = torch.tensor(((-1, -2), (-3, -4)))
4 >> torch.add(a, b)
tensor([[0, 0],
[0, 0]])
5 >> np.matmul(a, b)
array([[ -7, -10],
[-15, -22]])
Addition
Simple NumPy Snippets to PyTorch Snippets
1 >> import torch
2 >> a = torch.tensor(((1, 2), (3, 4)))
tensor([[1, 2],
[3, 4]])
3 >> b = torch.tensor(((-1, -2), (-3, -4)))
4 >> torch.add(a, b)
tensor([[0, 0],
[0, 0]])
5 >> torch.matmul(a, b)
tensor([[ -7, -10],
[-15, -22]])
Simple NumPy Snippets to PyTorch Snippets
Matrix multiplication
1 >> import torch
2 >> a = torch.tensor(((1, 2), (3, 4)))
tensor([[1, 2],
[3, 4]])
3 >> b = torch.tensor(((-1, -2), (-3, -4)))
4 >> torch.add(a, b)
tensor([[0, 0],
[0, 0]])
5 >> torch.matmul(a, b)
tensor([[ -7, -10],
[-15, -22]])
Simple PyTorch Snippets
1 >> np.fft.fft(np.exp(2j * np.pi * np.arange(8) / 8))
array([-3.44509285e-16 +1.14423775e-17 j,
8.00000000e+00 -8.11483250e-16 j,
2.33486982e-16 +1.22464680e-16 j,
0.00000000e+00 +1.22464680e-16 j,
9.95799250e-17 +2.33486982e-16 j,
0.00000000e+00 +7.66951701e-17 j,
1.14423775e-17 +1.22464680e-16 j,
0.00000000e+00 +1.22464680e-16 j])
2 >> A = np.array([[1,-2j],[2j,5]])
3 >> np.linalg.cholesky(A)
array([[1.+0.j, 0.+0.j],
[0.+2.j, 1.+0.j]])
More Complicated NumPy Snippets (Again)
1 >> torch.fft.fft(torch.exp(2j * math.pi * torch.arange(8) / 8))
2 tensor([ 3.2584e-07+3.1787e-08j, 8.0000e+00+4.8023e-07j,
3 -3.2584e-07+3.1787e-08j, -1.6859e-07+3.1787e-08j,
4 -3.8941e-07-2.0663e-07j, 1.3691e-07-1.9412e-07j,
5 3.8941e-07-2.0663e-07j, 1.6859e-07+3.1787e-08j])
1 >> A = torch.tensor([[1,-2j],[2j,5]])
2 >> torch.linalg.cholesky(A)
3 tensor([[1.+0.j, 0.+0.j],
4 [0.+2.j, 1.+0.j]])
More Complicated PyTorch Snippets
1 >> t = torch.tensor((1, 2, 3))
2 >> a = t.numpy()
3 array([1, 2, 3])
3 >> b = np.array((-1, -2, -3))
4 >> result = a + b
array([0, 0, 0])
5 >> torch.from_numpy(result)
tensor([0, 0, 0])
PyTorch and NumPy Interoperability
Does PyTorch have EVERY NumPy operator?
- No!
- NumPy has a lot of operators: A LOT
- Many of them are rarely used, niche, deprecated, or in need of deprecation
- But PyTorch does have hundreds of NumPy operators
1 >> import torch
2 >> a = torch.tensor(((1., 2), (3, 4)), device='cuda')
tensor([[1, 2],
[3, 4]], device='cuda:0')
3 >> b = torch.tensor(((-1, -2), (-3, -4)), device='cuda')
4 >> torch.add(a, b)
tensor([[0, 0],
[0, 0]], device='cuda:0')
5 >> torch.matmul(a.float(), b.float())
tensor([[ -7., -10.],
[-15., -22.]], device='cuda:0')
Simple PyTorch Snippets on CUDA
1 >> a = torch.tensor((1., 2.), requires_grad=True)
2 >> b = torch.tensor((3., 4.))
3 >> result = (a * b).sum()
4 >> result.backward()
5 >> a.grad
tensor([3., 4.])
Autograd in PyTorch
1 def sinc(x):
2 y = math.pi * torch.where(x == 0, 1.0e-20, x)
3 return torch.sin(y)/y
4
5 scripted_sinc = torch.jit.script(sinc)
graph(%x.1 : Tensor):
%1 : float = prim::Constant[value=3.1415926535897931 ]
%3 : int = prim::Constant[value=0]
%5 : float = prim::Constant[value=9.9999999999999995e-21 ]
%4 : Tensor = aten::eq(%x.1, %3)
%7 : Tensor = aten::where(%4, %5, %x.1)
%y.1 : Tensor = aten::mul(%7, %1)
%10 : Tensor = aten::sin(%y.1)
%12 : Tensor = aten::div(%10, %y.1)
return (%12)
Computational Graphs in PyTorch
1 >> t = torch.randn(10)
2 >> linear_layer = torch.nn.Linear(10, 5)
3 >> linear_layer(t)
tensor([ 0.0066, 0.2467, -0.0137, -0.4091, -1.1756],
grad_fn=<AddBackward0>)
Deep Learning in PyTorch
PyTorch as NumPy+
- While PyTorch doesn’t have every NumPy operator, for those it supports we
can think of it as NumPy PLUS:
- Support for hardware accelerators, like GPUs and TPUs
- Support for autograd
- Support computational graphs
- Support for deep learning
- A C++ API
- … and many additional features (visualization, distributed training, …)
- PyTorch also has additional operators that NumPy does not
PyTorch Behind the Scenes
- To recap, NumPy had…
- Composite operators (typically implemented in Python)
- Primitive operators (implemented in C++)
- And PyTorch has...
- Composite operators (implemented in C++)
- Primitive operators (implemented in C++, CPU intrinsics, and CUDA)
- Computational graphs (executed by torchscript or XLA)
- Plus autograd formulas for differentiable operations
1 def sinc(x):
2 x = np.asanyarray(x)
3 y = pi * where(x == 0, 1.0e-20, x)
4 return sin(y)/y
Sinc in NumPy (reminder)
1 static void sinc_kernel(TensorIteratorBase& iter) {
2 AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES_AND1(
kBFloat16, iter.common_dtype(), "sinc_cpu", [&]() {
3 cpu_kernel(
4 iter,
5 [=](scalar_t a) -> scalar_t {
6 if (a == scalar_t(0)) {
7 return scalar_t(1);
8 } else {
9 scalar_t product = c10::pi<scalar_t> * a;
10 return std::sin(product) / product;
11 }
12 });
13 });
14 }
Sinc in PyTorch, CPU kernel
Sinc in PyTorch, Autograd Formula
1 name: sinc(Tensor self) -> Tensor
2 self: grad *
((M_PI * self *
(M_PI * self).cos() - (M_PI * self).sin()) /
(M_PI * self * self)).conj()
Adding NumPy Operators to
PyTorch
Porting an operator from NumPy
- Need to write a C++ implementation
- Possibly a CPU kernel or a CUDA kernel
- Need to write an autograd formula (if the op is differentiable)
- Need to write comprehensive tests (more on this in a moment)
… why do we bother?
Porting an operator from NumPy
- Need to write a C++ implementation
- Possibly a CPU kernel or a CUDA kernel
- Made easier with the C++ “TensorIterator” architecture
- Need to write an autograd formula (if the op is differentiable)
- Simplified by allowing users to write Pythonic YAML formulas
- Need to write comprehensive tests (more on this in a moment)
- Significant coverage automated with PyTorch’s OpInfo metadata and test generation
framework
PyTorch’s test matrix
- Tensor properties:
- Datatype (long, float, complexfloat, etc.)
- Device (CPU, CUDA, TPU, etc.)
- Differentiable operations support autograd
- Operations need to work in computational graphs
- Operations have “function,” “method” and “inplace” variants
OpInfo for torch.mul
1 OpInfo('mul',
2 aliases =('multiply',),
3 dtypes =all_types_and_complex_and (
torch.float16, torch.bfloat16, torch.bool),
4 sample_inputs_func =sample_inputs_binary_pwise )
OpInfo for torch.sin
1 UnaryUfuncInfo ('sin',
2 ref=np.sin,
3 dtypes=all_types_and_complex_and (
torch.bool, torch.bfloat16),
4 dtypesIfCUDA=all_types_and_complex_and (
torch.bool, torch.half),
5 handles_large_floats =False,
6 handles_complex_extremals =False,
7 safe_casts_outputs =True,
8 decorators=(precisionOverride ({torch.bfloat16: 1e-2}),))
OpInfo test template
1 @ops(unary_ufuncs)
2 def test_contig_vs_transposed (self, device, dtype, op):
3 contig = make_tensor((789, 357),
device=device, dtype=dtype,
low=op.domain[0], high=op.domain[1])
4 non_contig = contig.T
5 self.assertTrue(contig.is_contiguous())
6 self.assertFalse(non_contig.is_contiguous())
7 torch_kwargs, _ = op.sample_kwargs(device, dtype, contig)
8 self.assertEqual(
op(contig, **torch_kwargs).T,
op(non_contig, **torch_kwargs))
Instantiated tests for torch.sin
@ops(unary_ufuncs)
def test_contig_vs_transposed (self, device, dtype, op):
test_contig_vs_transposed_sin_cuda_complex64
test_contig_vs_transposed_sin_cuda_float16
test_contig_vs_transposed_sin_cuda_float32
test_contig_vs_transposed_sin_cuda_int64
test_contig_vs_transposed_sin_cuda_uint8
test_contig_vs_transposed_sin_cpu_complex64
test_contig_vs_transposed_sin_cpu_float16
test_contig_vs_transposed_sin_cpu_float32
test_contig_vs_transposed_sin_cpu_int64
test_contig_vs_transposed_sin_cpu_uint8
Example properties validated for every operator
- Autograd is implemented correctly
- Tested using finite differences
- The operation works with torchscript and torch.fx
- The operation’s function, method, and inplace variants all compute the same
operation
- One big caveat: can’t automatically test correctness except for special
classes of operators (like unary ufuncs)
Features of PyTorch’s test generator
- Works with pytest and unittest
- Dynamically identifies available device types
- Allows for device type-specific logic for setup and teardown
- Extensible by other packages adding new device types (like PyTorch/XLA)
- Provides a central “source of truth” for operator’s functionality
- Makes it easy to test new features with every PyTorch operator
When PyTorch is Different
from NumPy
NumPy PyTorch
1 >> a = np.array((1, 2, 3))
2 >> np.reciprocal(a)
array([1, 0, 0])
np.reciprocal vs torch.reciprocal
1 >> t = torch.tensor((1, 2, 3))
2 >> torch.reciprocal(t)
tensor([
1.0000,
0.5000,
0.3333])
NumPy PyTorch
1 >> a = np.diag(
np.array((1., 2, 3)))
2 >> w, v = np.linalg.eig(a)
array([1., 2., 3.]),
array([
[1., 0., 0.],
[0., 1., 0.],
[0., 0., 1.]]))
np.linalg.eig vs torch.linalg.eig
1 >> t = torch.diag(
torch.tensor((1., 2, 3)))
2 >> w, v = torch.linalg.eig(t)
torch.return_types.linalg_eig(
eigenvalues=tensor(
[1.+0.j, 2.+0.j, 3.+0.j]),
eigenvectors=tensor(
[[1.+0.j, 0.+0.j, 0.+0.j],
[0.+0.j, 1.+0.j, 0.+0.j],
[0.+0.j, 0.+0.j, 1.+0.j]]))
NumPy PyTorch
1 >> a = np.array(
(complex(1, 2),
complex(2, 1)))
2 >> np.amax(a)
(2+1j)
3 >> np.sort(a)
array([1.+2.j, 2.+1.j],
dtype=complex64)
Ordering complex numbers in NumPy vs. PyTorch
1 >> t = torch.tensor(
(complex(1, 2),
complex(2, 1)))
2 >> torch.amax(t)
RUNTIME ERROR
3 >> torch.sort(t)
RUNTIME ERROR
Principled discrepancies
- The PyTorch community seems OK with these principled discrepancies
- Different behavior must be very similar to NumPy’s behavior
- It’s OK to not support some things, as long as there are other mechanisms to do them
- PyTorch also has systematic discrepancies with NumPy that pass without
comment
- Type promotion
- Functions vs. method variants
- Returning scalars vs tensors
Lessons Learned
and Future Work
Recap
- NumPy and PyTorch are popular Python packages with operators that manipulate
tensors
- PyTorch implements many of NumPy’s operators, and extends them with support for
hardware accelerators, autograd, and other systems that support modern scientific
computing and deep learning
- The PyTorch community wants both the functionality and familiarity these operators
provide
- But it’s OK with principled differences
- To make implementing all these operators tractable, PyTorch has had to develop
architecture supporting C++ and CUDA implementations, autograd formulas and
testing
Lessons Learned
- Do the work to engage your community and listen carefully to their feedback
- At first it wasn’t clear whether people just wanted the functionality of NumPy operators, but our
community has clarified they also want fidelity
- Focus on developer efficiency
- Be clear about your own principles when implementing operators from
another project
Future Work
- Prioritize deprecating and updating the few PyTorch operators with
significantly different behavior than their NumPy counterparts
- Make success criteria clearer: implementing every NumPy operator is
impractical and inadvisable
- The new Python Array API may solve this problem
- More focus on SciPy functionality, including SciPy’s special module, linear
algebra module, and optimizers
Thank you!

From NumPy to PyTorch

  • 1.
    From NumPy toPyTorch Mike Ruberry software engineer @ Facebook
  • 2.
    Outline - NumPy andworking with tensors - PyTorch and hardware accelerators, autograd, and computational graphs - Adding NumPy operators to Pytorch - When PyTorch is Different from NumPy - Lessons learned and future work
  • 3.
  • 4.
    1 >> importnumpy as np 2 >> a = np.array(((1, 2), (3, 4))) array([[1, 2], [3, 4]]) 3 >> b = np.array(((-1, -2), (-3, -4))) 4 >> np.add(a, b) array([[0, 0], [0, 0]]) 5 >> np.matmul(a, b) array([[ -7, -10], [-15, -22]]) Simple NumPy Snippets
  • 5.
    1 >> importnumpy as np 2 >> a = np.array(((1, 2), (3, 4))) array([[1, 2], [3, 4]]) 3 >> b = np.array(((-1, -2), (-3, -4))) 4 >> np.add(a, b) array([[0, 0], [0, 0]]) 5 >> np.matmul(a, b) array([[ -7, -10], [-15, -22]]) Simple NumPy Snippets Tensor creation
  • 6.
    1 >> importnumpy as np 2 >> a = np.array(((1, 2), (3, 4))) array([[1, 2], [3, 4]]) 3 >> b = np.array(((-1, -2), (-3, -4))) 4 >> np.add(a, b) array([[0, 0], [0, 0]]) 5 >> np.matmul(a, b) array([[ -7, -10], [-15, -22]]) Simple NumPy Snippets Addition
  • 7.
    1 >> importnumpy as np 2 >> a = np.array(((1, 2), (3, 4))) array([[1, 2], [3, 4]]) 3 >> b = np.array(((-1, -2), (-3, -4))) 4 >> np.add(a, b) array([[0, 0], [0, 0]]) 5 >> np.matmul(a, b) array([[ -7, -10], [-15, -22]]) Simple NumPy Snippets Matrix multiplication
  • 8.
    1 >> np.fft.fft(np.exp(2j* np.pi * np.arange(8) / 8)) array([-3.44509285e-16 +1.14423775e-17 j, 8.00000000e+00 -8.11483250e-16 j, 2.33486982e-16 +1.22464680e-16 j, 0.00000000e+00 +1.22464680e-16 j, 9.95799250e-17 +2.33486982e-16 j, 0.00000000e+00 +7.66951701e-17 j, 1.14423775e-17 +1.22464680e-16 j, 0.00000000e+00 +1.22464680e-16 j]) 2 >> A = np.array([[1,-2j],[2j,5]]) 3 >> np.linalg.cholesky(A) array([[1.+0.j, 0.+0.j], [0.+2.j, 1.+0.j]]) More Complicated NumPy Snippets
  • 9.
  • 10.
    Composites Primitives 1 defsinc(x): 2 x = np.asanyarray(x) 3 y = pi * where(x == 0, 1.0e-20, x) 4 return sin(y)/y 1 double npy_copysign( double x, double y) 2 { 3 npy_uint32 hx , hy; 4 GET_HIGH_WORD(hx, x); 5 GET_HIGH_WORD(hy, y); 6 SET_HIGH_WORD(x, (hx & 0x7fffffff) | (hy & 0x80000000)); 7 return x; 8 }
  • 11.
  • 12.
    1 >> importnumpy as np 2 >> a = np.array(((1, 2), (3, 4))) array([[1, 2], [3, 4]]) 3 >> b = np.array(((-1, -2), (-3, -4))) 4 >> np.add(a, b) array([[0, 0], [0, 0]]) 5 >> np.matmul(a, b) array([[ -7, -10], [-15, -22]]) Simple NumPy Snippets (Again)
  • 13.
    1 >> importtorch 2 >> a = torch.tensor(((1, 2), (3, 4))) tensor([[1, 2], [3, 4]]) 3 >> b = np.array(((-1, -2), (-3, -4))) 4 >> np.add(a, b) array([[0, 0], [0, 0]]) 5 >> np.matmul(a, b) array([[ -7, -10], [-15, -22]]) Simple NumPy Snippets to PyTorch Snippets Tensor creation
  • 14.
    1 >> importtorch 2 >> a = torch.tensor(((1, 2), (3, 4))) tensor([[1, 2], [3, 4]]) 3 >> b = torch.tensor(((-1, -2), (-3, -4))) 4 >> torch.add(a, b) tensor([[0, 0], [0, 0]]) 5 >> np.matmul(a, b) array([[ -7, -10], [-15, -22]]) Addition Simple NumPy Snippets to PyTorch Snippets
  • 15.
    1 >> importtorch 2 >> a = torch.tensor(((1, 2), (3, 4))) tensor([[1, 2], [3, 4]]) 3 >> b = torch.tensor(((-1, -2), (-3, -4))) 4 >> torch.add(a, b) tensor([[0, 0], [0, 0]]) 5 >> torch.matmul(a, b) tensor([[ -7, -10], [-15, -22]]) Simple NumPy Snippets to PyTorch Snippets Matrix multiplication
  • 16.
    1 >> importtorch 2 >> a = torch.tensor(((1, 2), (3, 4))) tensor([[1, 2], [3, 4]]) 3 >> b = torch.tensor(((-1, -2), (-3, -4))) 4 >> torch.add(a, b) tensor([[0, 0], [0, 0]]) 5 >> torch.matmul(a, b) tensor([[ -7, -10], [-15, -22]]) Simple PyTorch Snippets
  • 17.
    1 >> np.fft.fft(np.exp(2j* np.pi * np.arange(8) / 8)) array([-3.44509285e-16 +1.14423775e-17 j, 8.00000000e+00 -8.11483250e-16 j, 2.33486982e-16 +1.22464680e-16 j, 0.00000000e+00 +1.22464680e-16 j, 9.95799250e-17 +2.33486982e-16 j, 0.00000000e+00 +7.66951701e-17 j, 1.14423775e-17 +1.22464680e-16 j, 0.00000000e+00 +1.22464680e-16 j]) 2 >> A = np.array([[1,-2j],[2j,5]]) 3 >> np.linalg.cholesky(A) array([[1.+0.j, 0.+0.j], [0.+2.j, 1.+0.j]]) More Complicated NumPy Snippets (Again)
  • 18.
    1 >> torch.fft.fft(torch.exp(2j* math.pi * torch.arange(8) / 8)) 2 tensor([ 3.2584e-07+3.1787e-08j, 8.0000e+00+4.8023e-07j, 3 -3.2584e-07+3.1787e-08j, -1.6859e-07+3.1787e-08j, 4 -3.8941e-07-2.0663e-07j, 1.3691e-07-1.9412e-07j, 5 3.8941e-07-2.0663e-07j, 1.6859e-07+3.1787e-08j]) 1 >> A = torch.tensor([[1,-2j],[2j,5]]) 2 >> torch.linalg.cholesky(A) 3 tensor([[1.+0.j, 0.+0.j], 4 [0.+2.j, 1.+0.j]]) More Complicated PyTorch Snippets
  • 19.
    1 >> t= torch.tensor((1, 2, 3)) 2 >> a = t.numpy() 3 array([1, 2, 3]) 3 >> b = np.array((-1, -2, -3)) 4 >> result = a + b array([0, 0, 0]) 5 >> torch.from_numpy(result) tensor([0, 0, 0]) PyTorch and NumPy Interoperability
  • 20.
    Does PyTorch haveEVERY NumPy operator? - No! - NumPy has a lot of operators: A LOT - Many of them are rarely used, niche, deprecated, or in need of deprecation - But PyTorch does have hundreds of NumPy operators
  • 21.
    1 >> importtorch 2 >> a = torch.tensor(((1., 2), (3, 4)), device='cuda') tensor([[1, 2], [3, 4]], device='cuda:0') 3 >> b = torch.tensor(((-1, -2), (-3, -4)), device='cuda') 4 >> torch.add(a, b) tensor([[0, 0], [0, 0]], device='cuda:0') 5 >> torch.matmul(a.float(), b.float()) tensor([[ -7., -10.], [-15., -22.]], device='cuda:0') Simple PyTorch Snippets on CUDA
  • 22.
    1 >> a= torch.tensor((1., 2.), requires_grad=True) 2 >> b = torch.tensor((3., 4.)) 3 >> result = (a * b).sum() 4 >> result.backward() 5 >> a.grad tensor([3., 4.]) Autograd in PyTorch
  • 23.
    1 def sinc(x): 2y = math.pi * torch.where(x == 0, 1.0e-20, x) 3 return torch.sin(y)/y 4 5 scripted_sinc = torch.jit.script(sinc) graph(%x.1 : Tensor): %1 : float = prim::Constant[value=3.1415926535897931 ] %3 : int = prim::Constant[value=0] %5 : float = prim::Constant[value=9.9999999999999995e-21 ] %4 : Tensor = aten::eq(%x.1, %3) %7 : Tensor = aten::where(%4, %5, %x.1) %y.1 : Tensor = aten::mul(%7, %1) %10 : Tensor = aten::sin(%y.1) %12 : Tensor = aten::div(%10, %y.1) return (%12) Computational Graphs in PyTorch
  • 24.
    1 >> t= torch.randn(10) 2 >> linear_layer = torch.nn.Linear(10, 5) 3 >> linear_layer(t) tensor([ 0.0066, 0.2467, -0.0137, -0.4091, -1.1756], grad_fn=<AddBackward0>) Deep Learning in PyTorch
  • 25.
    PyTorch as NumPy+ -While PyTorch doesn’t have every NumPy operator, for those it supports we can think of it as NumPy PLUS: - Support for hardware accelerators, like GPUs and TPUs - Support for autograd - Support computational graphs - Support for deep learning - A C++ API - … and many additional features (visualization, distributed training, …) - PyTorch also has additional operators that NumPy does not
  • 26.
    PyTorch Behind theScenes - To recap, NumPy had… - Composite operators (typically implemented in Python) - Primitive operators (implemented in C++) - And PyTorch has... - Composite operators (implemented in C++) - Primitive operators (implemented in C++, CPU intrinsics, and CUDA) - Computational graphs (executed by torchscript or XLA) - Plus autograd formulas for differentiable operations
  • 27.
    1 def sinc(x): 2x = np.asanyarray(x) 3 y = pi * where(x == 0, 1.0e-20, x) 4 return sin(y)/y Sinc in NumPy (reminder)
  • 28.
    1 static voidsinc_kernel(TensorIteratorBase& iter) { 2 AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES_AND1( kBFloat16, iter.common_dtype(), "sinc_cpu", [&]() { 3 cpu_kernel( 4 iter, 5 [=](scalar_t a) -> scalar_t { 6 if (a == scalar_t(0)) { 7 return scalar_t(1); 8 } else { 9 scalar_t product = c10::pi<scalar_t> * a; 10 return std::sin(product) / product; 11 } 12 }); 13 }); 14 } Sinc in PyTorch, CPU kernel
  • 29.
    Sinc in PyTorch,Autograd Formula 1 name: sinc(Tensor self) -> Tensor 2 self: grad * ((M_PI * self * (M_PI * self).cos() - (M_PI * self).sin()) / (M_PI * self * self)).conj()
  • 30.
  • 31.
    Porting an operatorfrom NumPy - Need to write a C++ implementation - Possibly a CPU kernel or a CUDA kernel - Need to write an autograd formula (if the op is differentiable) - Need to write comprehensive tests (more on this in a moment) … why do we bother?
  • 35.
    Porting an operatorfrom NumPy - Need to write a C++ implementation - Possibly a CPU kernel or a CUDA kernel - Made easier with the C++ “TensorIterator” architecture - Need to write an autograd formula (if the op is differentiable) - Simplified by allowing users to write Pythonic YAML formulas - Need to write comprehensive tests (more on this in a moment) - Significant coverage automated with PyTorch’s OpInfo metadata and test generation framework
  • 36.
    PyTorch’s test matrix -Tensor properties: - Datatype (long, float, complexfloat, etc.) - Device (CPU, CUDA, TPU, etc.) - Differentiable operations support autograd - Operations need to work in computational graphs - Operations have “function,” “method” and “inplace” variants
  • 37.
    OpInfo for torch.mul 1OpInfo('mul', 2 aliases =('multiply',), 3 dtypes =all_types_and_complex_and ( torch.float16, torch.bfloat16, torch.bool), 4 sample_inputs_func =sample_inputs_binary_pwise )
  • 38.
    OpInfo for torch.sin 1UnaryUfuncInfo ('sin', 2 ref=np.sin, 3 dtypes=all_types_and_complex_and ( torch.bool, torch.bfloat16), 4 dtypesIfCUDA=all_types_and_complex_and ( torch.bool, torch.half), 5 handles_large_floats =False, 6 handles_complex_extremals =False, 7 safe_casts_outputs =True, 8 decorators=(precisionOverride ({torch.bfloat16: 1e-2}),))
  • 39.
    OpInfo test template 1@ops(unary_ufuncs) 2 def test_contig_vs_transposed (self, device, dtype, op): 3 contig = make_tensor((789, 357), device=device, dtype=dtype, low=op.domain[0], high=op.domain[1]) 4 non_contig = contig.T 5 self.assertTrue(contig.is_contiguous()) 6 self.assertFalse(non_contig.is_contiguous()) 7 torch_kwargs, _ = op.sample_kwargs(device, dtype, contig) 8 self.assertEqual( op(contig, **torch_kwargs).T, op(non_contig, **torch_kwargs))
  • 40.
    Instantiated tests fortorch.sin @ops(unary_ufuncs) def test_contig_vs_transposed (self, device, dtype, op): test_contig_vs_transposed_sin_cuda_complex64 test_contig_vs_transposed_sin_cuda_float16 test_contig_vs_transposed_sin_cuda_float32 test_contig_vs_transposed_sin_cuda_int64 test_contig_vs_transposed_sin_cuda_uint8 test_contig_vs_transposed_sin_cpu_complex64 test_contig_vs_transposed_sin_cpu_float16 test_contig_vs_transposed_sin_cpu_float32 test_contig_vs_transposed_sin_cpu_int64 test_contig_vs_transposed_sin_cpu_uint8
  • 41.
    Example properties validatedfor every operator - Autograd is implemented correctly - Tested using finite differences - The operation works with torchscript and torch.fx - The operation’s function, method, and inplace variants all compute the same operation - One big caveat: can’t automatically test correctness except for special classes of operators (like unary ufuncs)
  • 42.
    Features of PyTorch’stest generator - Works with pytest and unittest - Dynamically identifies available device types - Allows for device type-specific logic for setup and teardown - Extensible by other packages adding new device types (like PyTorch/XLA) - Provides a central “source of truth” for operator’s functionality - Makes it easy to test new features with every PyTorch operator
  • 43.
    When PyTorch isDifferent from NumPy
  • 44.
    NumPy PyTorch 1 >>a = np.array((1, 2, 3)) 2 >> np.reciprocal(a) array([1, 0, 0]) np.reciprocal vs torch.reciprocal 1 >> t = torch.tensor((1, 2, 3)) 2 >> torch.reciprocal(t) tensor([ 1.0000, 0.5000, 0.3333])
  • 45.
    NumPy PyTorch 1 >>a = np.diag( np.array((1., 2, 3))) 2 >> w, v = np.linalg.eig(a) array([1., 2., 3.]), array([ [1., 0., 0.], [0., 1., 0.], [0., 0., 1.]])) np.linalg.eig vs torch.linalg.eig 1 >> t = torch.diag( torch.tensor((1., 2, 3))) 2 >> w, v = torch.linalg.eig(t) torch.return_types.linalg_eig( eigenvalues=tensor( [1.+0.j, 2.+0.j, 3.+0.j]), eigenvectors=tensor( [[1.+0.j, 0.+0.j, 0.+0.j], [0.+0.j, 1.+0.j, 0.+0.j], [0.+0.j, 0.+0.j, 1.+0.j]]))
  • 46.
    NumPy PyTorch 1 >>a = np.array( (complex(1, 2), complex(2, 1))) 2 >> np.amax(a) (2+1j) 3 >> np.sort(a) array([1.+2.j, 2.+1.j], dtype=complex64) Ordering complex numbers in NumPy vs. PyTorch 1 >> t = torch.tensor( (complex(1, 2), complex(2, 1))) 2 >> torch.amax(t) RUNTIME ERROR 3 >> torch.sort(t) RUNTIME ERROR
  • 47.
    Principled discrepancies - ThePyTorch community seems OK with these principled discrepancies - Different behavior must be very similar to NumPy’s behavior - It’s OK to not support some things, as long as there are other mechanisms to do them - PyTorch also has systematic discrepancies with NumPy that pass without comment - Type promotion - Functions vs. method variants - Returning scalars vs tensors
  • 48.
  • 49.
    Recap - NumPy andPyTorch are popular Python packages with operators that manipulate tensors - PyTorch implements many of NumPy’s operators, and extends them with support for hardware accelerators, autograd, and other systems that support modern scientific computing and deep learning - The PyTorch community wants both the functionality and familiarity these operators provide - But it’s OK with principled differences - To make implementing all these operators tractable, PyTorch has had to develop architecture supporting C++ and CUDA implementations, autograd formulas and testing
  • 50.
    Lessons Learned - Dothe work to engage your community and listen carefully to their feedback - At first it wasn’t clear whether people just wanted the functionality of NumPy operators, but our community has clarified they also want fidelity - Focus on developer efficiency - Be clear about your own principles when implementing operators from another project
  • 51.
    Future Work - Prioritizedeprecating and updating the few PyTorch operators with significantly different behavior than their NumPy counterparts - Make success criteria clearer: implementing every NumPy operator is impractical and inadvisable - The new Python Array API may solve this problem - More focus on SciPy functionality, including SciPy’s special module, linear algebra module, and optimizers
  • 52.