KEMBAR78
[MPS] Add scatter_reduce.two by malfet · Pull Request #141948 · pytorch/pytorch · GitHub
Skip to content

Conversation

@malfet
Copy link
Contributor

@malfet malfet commented Dec 3, 2024

Which has been request 20+ times on #77764 is just a flavor of out-of-box scatter-reduce, so all this op does is redispatches existing implementation.
Unsupported dtype/reduction type combinations:

  • min/max for int64
  • min/max for int32 on MacOS-14 or older

Following swift code demonstrates problem with scatterAlongAxis MPS call

import Metal
import MetalPerformanceShadersGraph


func scatterMPS(device: MTLDevice, 
                inp_buf: MTLBuffer, upd_buf: MTLBuffer,
                idx_buf: MTLBuffer, out_buf: MTLBuffer,
                inp_elem: Int, upd_elem: Int) {
  let graph = MPSGraph()
  let inputPlaceholder = graph.placeholder(shape: [inp_elem as NSNumber], dataType: .int64, name: nil)
  let updatesPlaceholder = graph.placeholder(shape: [upd_elem as NSNumber], dataType: .int64, name: nil)
  let indicesPlaceholder = graph.placeholder(shape: [upd_elem as NSNumber], dataType: .int64, name: nil)
  let outNode = graph.scatterAlongAxis(0, data: inputPlaceholder, updates: updatesPlaceholder, indices: indicesPlaceholder, mode: .min, name: nil)
  let mpsInputBuffer = MPSGraphTensorData(inp_buf, shape: [inp_elem as NSNumber], dataType: .int64)
  let mpsUpdatesBuffer = MPSGraphTensorData(upd_buf, shape: [upd_elem as NSNumber], dataType: .int64)
  let mpsIndicesBuffer = MPSGraphTensorData(idx_buf, shape: [upd_elem as NSNumber], dataType: .int64)
  let mpsOutputBuffer = MPSGraphTensorData(out_buf, shape: [inp_elem as NSNumber], dataType: .int64)
  guard let queue = device.makeCommandQueue() else { fatalError("Can't make queue") }
  graph.run(with: queue, feeds: [inputPlaceholder: mpsInputBuffer, 
                               updatesPlaceholder: mpsUpdatesBuffer,
                               indicesPlaceholder: mpsIndicesBuffer ],
            targetOperations: nil, resultsDictionary: [outNode: mpsOutputBuffer])
}

func makeBufferWithValues(device: MTLDevice, values: [Int64]) -> MTLBuffer {
  guard let buf = device.makeBuffer(length: values.count * MemoryLayout<Int64>.size, options: [.storageModeShared]) else { fatalError("Can't alloc") }
  let buf_data = buf.contents().assumingMemoryBound(to: Int64.self)
  for i in 0..<values.count {
    buf_data[i] = values[i]
  }
  return buf
}

guard let device = MTLCopyAllDevices().first else { fatalError("Not Metal device found") }
print("Using device \(device.name)")

let inp_elem = 4
let upd_elem = 4
let inp_buf = makeBufferWithValues(device: device, values: [1, 2, 3, 4])
let upd_buf = makeBufferWithValues(device: device, values: [Int64.max - 1, Int64.max - 2 , Int64.max >> 16 , 11])
let idx_buf = makeBufferWithValues(device: device, values: [0, 1, 2, 3])
guard let out_buf = device.makeBuffer(length:inp_elem * MemoryLayout<Int64>.size, options: [.storageModeShared]) else { fatalError("Can't alloc") }

scatterMPS(device: device, 
           inp_buf: inp_buf, upd_buf: upd_buf,
           idx_buf: idx_buf, out_buf: out_buf,
           inp_elem: inp_elem, upd_elem: upd_elem)

let obuf_data = out_buf.contents().assumingMemoryBound(to: Int64.self)
for i in 0..<inp_elem {
    print("out_buf[\(i)] = \(obuf_data[i])")
}

that prints 4294967294, 4294967293, 4294967295, 4 instead of expected 1, 2, 3, 4
Where torch.tensor([[1, 9223372036854775806], [2, 9223372036854775805], [3, 140737488355327], [4, 11]], dtype=torch.int64, device='mps').max(1) yields an expected results

@malfet malfet requested a review from kulinseth as a code owner December 3, 2024 06:51
@pytorch-bot
Copy link

pytorch-bot bot commented Dec 3, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/141948

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 1ff4994 with merge base 78543e6 (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added ciflow/mps Run MPS tests (subset of trunk) release notes: mps Release notes category labels Dec 3, 2024
@malfet malfet requested a review from Skylion007 December 3, 2024 06:52
@github-actions
Copy link
Contributor

github-actions bot commented Dec 3, 2024

Attention! native_functions.yaml was changed

If you are adding a new function or defaulted argument to native_functions.yaml, you cannot use it from pre-existing Python frontend code until our FC window passes (two weeks). Split your PR into two PRs, one which adds the new C++ functionality, and one that makes use of it from Python, and land them two weeks apart. See https://github.com/pytorch/pytorch/wiki/PyTorch's-Python-Frontend-Backward-and-Forward-Compatibility-Policy#forwards-compatibility-fc for more info.


Caused by:

@malfet malfet force-pushed the malfet/mps-add-scatter-reduce-two branch from b8c9b2d to c63b817 Compare December 3, 2024 18:07
@malfet malfet added this to the 2.6.0 milestone Dec 3, 2024
@malfet
Copy link
Contributor Author

malfet commented Dec 4, 2024

@pytorchbot merge -f "MPS tests + Lint are green"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
Which has been request 20+ times on pytorch#77764 is just a flavor of out-of-box scatter-reduce, so all this op does is redispatches existing implementation.
Unsupported dtype/reduction type combinations:
 - min/max for int64
 - min/max for int32 on MacOS-14 or older

Following swift code demonstrates problem with scatterAlongAxis MPS call
```swift
import Metal
import MetalPerformanceShadersGraph

func scatterMPS(device: MTLDevice,
                inp_buf: MTLBuffer, upd_buf: MTLBuffer,
                idx_buf: MTLBuffer, out_buf: MTLBuffer,
                inp_elem: Int, upd_elem: Int) {
  let graph = MPSGraph()
  let inputPlaceholder = graph.placeholder(shape: [inp_elem as NSNumber], dataType: .int64, name: nil)
  let updatesPlaceholder = graph.placeholder(shape: [upd_elem as NSNumber], dataType: .int64, name: nil)
  let indicesPlaceholder = graph.placeholder(shape: [upd_elem as NSNumber], dataType: .int64, name: nil)
  let outNode = graph.scatterAlongAxis(0, data: inputPlaceholder, updates: updatesPlaceholder, indices: indicesPlaceholder, mode: .min, name: nil)
  let mpsInputBuffer = MPSGraphTensorData(inp_buf, shape: [inp_elem as NSNumber], dataType: .int64)
  let mpsUpdatesBuffer = MPSGraphTensorData(upd_buf, shape: [upd_elem as NSNumber], dataType: .int64)
  let mpsIndicesBuffer = MPSGraphTensorData(idx_buf, shape: [upd_elem as NSNumber], dataType: .int64)
  let mpsOutputBuffer = MPSGraphTensorData(out_buf, shape: [inp_elem as NSNumber], dataType: .int64)
  guard let queue = device.makeCommandQueue() else { fatalError("Can't make queue") }
  graph.run(with: queue, feeds: [inputPlaceholder: mpsInputBuffer,
                               updatesPlaceholder: mpsUpdatesBuffer,
                               indicesPlaceholder: mpsIndicesBuffer ],
            targetOperations: nil, resultsDictionary: [outNode: mpsOutputBuffer])
}

func makeBufferWithValues(device: MTLDevice, values: [Int64]) -> MTLBuffer {
  guard let buf = device.makeBuffer(length: values.count * MemoryLayout<Int64>.size, options: [.storageModeShared]) else { fatalError("Can't alloc") }
  let buf_data = buf.contents().assumingMemoryBound(to: Int64.self)
  for i in 0..<values.count {
    buf_data[i] = values[i]
  }
  return buf
}

guard let device = MTLCopyAllDevices().first else { fatalError("Not Metal device found") }
print("Using device \(device.name)")

let inp_elem = 4
let upd_elem = 4
let inp_buf = makeBufferWithValues(device: device, values: [1, 2, 3, 4])
let upd_buf = makeBufferWithValues(device: device, values: [Int64.max - 1, Int64.max - 2 , Int64.max >> 16 , 11])
let idx_buf = makeBufferWithValues(device: device, values: [0, 1, 2, 3])
guard let out_buf = device.makeBuffer(length:inp_elem * MemoryLayout<Int64>.size, options: [.storageModeShared]) else { fatalError("Can't alloc") }

scatterMPS(device: device,
           inp_buf: inp_buf, upd_buf: upd_buf,
           idx_buf: idx_buf, out_buf: out_buf,
           inp_elem: inp_elem, upd_elem: upd_elem)

let obuf_data = out_buf.contents().assumingMemoryBound(to: Int64.self)
for i in 0..<inp_elem {
    print("out_buf[\(i)] = \(obuf_data[i])")
}
```
that prints `4294967294, 4294967293, 4294967295, 4` instead of expected `1, 2, 3, 4`
Where `torch.tensor([[1, 9223372036854775806], [2, 9223372036854775805], [3, 140737488355327], [4, 11]], dtype=torch.int64, device='mps').max(1)` yields an expected results
Pull Request resolved: pytorch#141948
Approved by: https://github.com/manuelcandales
AmdSampsa pushed a commit to AmdSampsa/pytorch that referenced this pull request Dec 9, 2024
Which has been request 20+ times on pytorch#77764 is just a flavor of out-of-box scatter-reduce, so all this op does is redispatches existing implementation.
Unsupported dtype/reduction type combinations:
 - min/max for int64
 - min/max for int32 on MacOS-14 or older

Following swift code demonstrates problem with scatterAlongAxis MPS call
```swift
import Metal
import MetalPerformanceShadersGraph

func scatterMPS(device: MTLDevice,
                inp_buf: MTLBuffer, upd_buf: MTLBuffer,
                idx_buf: MTLBuffer, out_buf: MTLBuffer,
                inp_elem: Int, upd_elem: Int) {
  let graph = MPSGraph()
  let inputPlaceholder = graph.placeholder(shape: [inp_elem as NSNumber], dataType: .int64, name: nil)
  let updatesPlaceholder = graph.placeholder(shape: [upd_elem as NSNumber], dataType: .int64, name: nil)
  let indicesPlaceholder = graph.placeholder(shape: [upd_elem as NSNumber], dataType: .int64, name: nil)
  let outNode = graph.scatterAlongAxis(0, data: inputPlaceholder, updates: updatesPlaceholder, indices: indicesPlaceholder, mode: .min, name: nil)
  let mpsInputBuffer = MPSGraphTensorData(inp_buf, shape: [inp_elem as NSNumber], dataType: .int64)
  let mpsUpdatesBuffer = MPSGraphTensorData(upd_buf, shape: [upd_elem as NSNumber], dataType: .int64)
  let mpsIndicesBuffer = MPSGraphTensorData(idx_buf, shape: [upd_elem as NSNumber], dataType: .int64)
  let mpsOutputBuffer = MPSGraphTensorData(out_buf, shape: [inp_elem as NSNumber], dataType: .int64)
  guard let queue = device.makeCommandQueue() else { fatalError("Can't make queue") }
  graph.run(with: queue, feeds: [inputPlaceholder: mpsInputBuffer,
                               updatesPlaceholder: mpsUpdatesBuffer,
                               indicesPlaceholder: mpsIndicesBuffer ],
            targetOperations: nil, resultsDictionary: [outNode: mpsOutputBuffer])
}

func makeBufferWithValues(device: MTLDevice, values: [Int64]) -> MTLBuffer {
  guard let buf = device.makeBuffer(length: values.count * MemoryLayout<Int64>.size, options: [.storageModeShared]) else { fatalError("Can't alloc") }
  let buf_data = buf.contents().assumingMemoryBound(to: Int64.self)
  for i in 0..<values.count {
    buf_data[i] = values[i]
  }
  return buf
}

guard let device = MTLCopyAllDevices().first else { fatalError("Not Metal device found") }
print("Using device \(device.name)")

let inp_elem = 4
let upd_elem = 4
let inp_buf = makeBufferWithValues(device: device, values: [1, 2, 3, 4])
let upd_buf = makeBufferWithValues(device: device, values: [Int64.max - 1, Int64.max - 2 , Int64.max >> 16 , 11])
let idx_buf = makeBufferWithValues(device: device, values: [0, 1, 2, 3])
guard let out_buf = device.makeBuffer(length:inp_elem * MemoryLayout<Int64>.size, options: [.storageModeShared]) else { fatalError("Can't alloc") }

scatterMPS(device: device,
           inp_buf: inp_buf, upd_buf: upd_buf,
           idx_buf: idx_buf, out_buf: out_buf,
           inp_elem: inp_elem, upd_elem: upd_elem)

let obuf_data = out_buf.contents().assumingMemoryBound(to: Int64.self)
for i in 0..<inp_elem {
    print("out_buf[\(i)] = \(obuf_data[i])")
}
```
that prints `4294967294, 4294967293, 4294967295, 4` instead of expected `1, 2, 3, 4`
Where `torch.tensor([[1, 9223372036854775806], [2, 9223372036854775805], [3, 140737488355327], [4, 11]], dtype=torch.int64, device='mps').max(1)` yields an expected results
Pull Request resolved: pytorch#141948
Approved by: https://github.com/manuelcandales
@malfet malfet deleted the malfet/mps-add-scatter-reduce-two branch December 12, 2024 22:30
@atalman
Copy link
Contributor

atalman commented Jan 21, 2025

Results running final rc for 2.6 on MacOS 15.1.1:

python test_mps.py -v -k scatter_reduce
Fail to import hypothesis in common_utils, tests are not derandomized
test_output_grad_match_scatter_reduce_amax_cpu_float16 (__main__.TestConsistencyCPU) ... ok
test_output_grad_match_scatter_reduce_amax_cpu_float32 (__main__.TestConsistencyCPU) ... ok
test_output_grad_match_scatter_reduce_amin_cpu_float16 (__main__.TestConsistencyCPU) ... ok
test_output_grad_match_scatter_reduce_amin_cpu_float32 (__main__.TestConsistencyCPU) ... ok
test_output_grad_match_scatter_reduce_mean_cpu_float16 (__main__.TestConsistencyCPU) ... ok
test_output_grad_match_scatter_reduce_mean_cpu_float32 (__main__.TestConsistencyCPU) ... ok
test_output_grad_match_scatter_reduce_prod_cpu_float16 (__main__.TestConsistencyCPU) ... ok
test_output_grad_match_scatter_reduce_prod_cpu_float32 (__main__.TestConsistencyCPU) ... ok
test_output_grad_match_scatter_reduce_sum_cpu_float16 (__main__.TestConsistencyCPU) ... ok
test_output_grad_match_scatter_reduce_sum_cpu_float32 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_amax_cpu_bfloat16 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_amax_cpu_bool (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_amax_cpu_float16 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_amax_cpu_float32 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_amax_cpu_int16 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_amax_cpu_int32 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_amax_cpu_int64 (__main__.TestConsistencyCPU) ... expected failure
test_output_match_scatter_reduce_amax_cpu_int8 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_amax_cpu_uint8 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_amin_cpu_bfloat16 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_amin_cpu_bool (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_amin_cpu_float16 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_amin_cpu_float32 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_amin_cpu_int16 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_amin_cpu_int32 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_amin_cpu_int64 (__main__.TestConsistencyCPU) ... expected failure
test_output_match_scatter_reduce_amin_cpu_int8 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_amin_cpu_uint8 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_mean_cpu_bfloat16 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_mean_cpu_float16 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_mean_cpu_float32 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_mean_cpu_int16 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_mean_cpu_int32 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_mean_cpu_int64 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_mean_cpu_int8 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_mean_cpu_uint8 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_prod_cpu_bfloat16 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_prod_cpu_bool (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_prod_cpu_float16 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_prod_cpu_float32 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_prod_cpu_int16 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_prod_cpu_int32 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_prod_cpu_int64 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_prod_cpu_int8 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_prod_cpu_uint8 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_sum_cpu_bfloat16 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_sum_cpu_bool (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_sum_cpu_float16 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_sum_cpu_float32 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_sum_cpu_int16 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_sum_cpu_int32 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_sum_cpu_int64 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_sum_cpu_int8 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_sum_cpu_uint8 (__main__.TestConsistencyCPU) ... ok
test_scatter_reduce (__main__.TestMPS) ... /Users/atalman/Downloads/release26/pytorch/test/test_mps.py:7655: UserWarning: The reduce argument of torch.scatter with Tensor src is deprecated and will be removed in a future PyTorch release. Use torch.scatter_reduce instead for more reduction options. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/TensorAdvancedIndexing.cpp:234.)
  scatter_result = torch.scatter(x, dim=dim, index=idx, src=src, reduce=reduce_str)
ok

----------------------------------------------------------------------
Ran 55 tests in 5.660s

OK (expected failures=2)

Final rc running on MacOs 14.4:

python test_mps.py -v -k scatter_reduce
Fail to import hypothesis in common_utils, tests are not derandomized
test_output_grad_match_scatter_reduce_amax_cpu_float16 (__main__.TestConsistencyCPU) ... ok
test_output_grad_match_scatter_reduce_amax_cpu_float32 (__main__.TestConsistencyCPU) ... ok
test_output_grad_match_scatter_reduce_amin_cpu_float16 (__main__.TestConsistencyCPU) ... ok
test_output_grad_match_scatter_reduce_amin_cpu_float32 (__main__.TestConsistencyCPU) ... ok
test_output_grad_match_scatter_reduce_mean_cpu_float16 (__main__.TestConsistencyCPU) ... ok
test_output_grad_match_scatter_reduce_mean_cpu_float32 (__main__.TestConsistencyCPU) ... ok
test_output_grad_match_scatter_reduce_prod_cpu_float16 (__main__.TestConsistencyCPU) ... ok
test_output_grad_match_scatter_reduce_prod_cpu_float32 (__main__.TestConsistencyCPU) ... ok
test_output_grad_match_scatter_reduce_sum_cpu_float16 (__main__.TestConsistencyCPU) ... ok
test_output_grad_match_scatter_reduce_sum_cpu_float32 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_amax_cpu_bfloat16 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_amax_cpu_bool (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_amax_cpu_float16 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_amax_cpu_float32 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_amax_cpu_int16 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_amax_cpu_int32 (__main__.TestConsistencyCPU) ... expected failure
test_output_match_scatter_reduce_amax_cpu_int64 (__main__.TestConsistencyCPU) ... expected failure
test_output_match_scatter_reduce_amax_cpu_int8 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_amax_cpu_uint8 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_amin_cpu_bfloat16 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_amin_cpu_bool (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_amin_cpu_float16 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_amin_cpu_float32 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_amin_cpu_int16 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_amin_cpu_int32 (__main__.TestConsistencyCPU) ... expected failure
test_output_match_scatter_reduce_amin_cpu_int64 (__main__.TestConsistencyCPU) ... expected failure
test_output_match_scatter_reduce_amin_cpu_int8 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_amin_cpu_uint8 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_mean_cpu_bfloat16 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_mean_cpu_float16 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_mean_cpu_float32 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_mean_cpu_int16 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_mean_cpu_int32 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_mean_cpu_int64 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_mean_cpu_int8 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_mean_cpu_uint8 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_prod_cpu_bfloat16 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_prod_cpu_bool (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_prod_cpu_float16 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_prod_cpu_float32 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_prod_cpu_int16 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_prod_cpu_int32 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_prod_cpu_int64 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_prod_cpu_int8 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_prod_cpu_uint8 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_sum_cpu_bfloat16 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_sum_cpu_bool (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_sum_cpu_float16 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_sum_cpu_float32 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_sum_cpu_int16 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_sum_cpu_int32 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_sum_cpu_int64 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_sum_cpu_int8 (__main__.TestConsistencyCPU) ... ok
test_output_match_scatter_reduce_sum_cpu_uint8 (__main__.TestConsistencyCPU) ... ok
test_scatter_reduce (__main__.TestMPS) ... /Users/ec2-user/test/pytorch/test/test_mps.py:7655: UserWarning: The reduce argument of torch.scatter with Tensor src is deprecated and will be removed in a future PyTorch release. Use torch.scatter_reduce instead for more reduction options. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/TensorAdvancedIndexing.cpp:234.)
  scatter_result = torch.scatter(x, dim=dim, index=idx, src=src, reduce=reduce_str)
ok

----------------------------------------------------------------------
Ran 55 tests in 10.426s

OK (expected failures=4)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/mps Run MPS tests (subset of trunk) Merged release notes: mps Release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants