Thanks to visit codestin.com
Credit goes to github.com

Skip to content

General MPS op coverage tracking issue #77764

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
albanD opened this issue May 18, 2022 · 1,729 comments
Open

General MPS op coverage tracking issue #77764

albanD opened this issue May 18, 2022 · 1,729 comments
Labels
feature A request for a proper, new feature. module: mps Related to Apple Metal Performance Shaders framework tracker A tracking issue triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@albanD
Copy link
Collaborator

albanD commented May 18, 2022

This issue is to have a centralized place to list and track work on adding support to new ops for the MPS backend.

PyTorch MPS Ops Project : Project to track all the ops for MPS backend. There are a very large number of operators in pytorch and so they are not all yet implemented. We will be prioritizing adding new operators based on user feedback. If possible, please also provide link to the network or use-case where this op is getting used.

As Ops are requested we will add " To Triage" pool. If we have 3+ requests for an operation and given its complexity/need the operation will be moved "To be implemented" pool. If you want to work on adding support for such op, feel free to comment below to get assigned one. Please avoid pickup up an op that is already being worked on tracked in "In progress" pool.

Link to the wiki for details on how to add these ops and example PRs.

MPS operators coverage matrix - The matrix covers most of the supported operators but is not exhaustive. Please look at the In vx.x.x column, if the box is green, it means that the op implementation is included in the latest release; on the other hand, if the box is yellow, it means the op implementation is in the nightly and has not yet included in the latest release. Before you comment below, please take a look at this matrix to make sure the operator you're requesting has not been implemented in nightly. More details can be found on the readme.

cc @kulinseth @malfet @DenisVieriu97 @jhavukainen

@albanD albanD added feature A request for a proper, new feature. triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module module: mps Related to Apple Metal Performance Shaders framework labels May 18, 2022
@albanD albanD changed the title General MPS op coverage issue General MPS op coverage tracking issue May 18, 2022
@philipturner
Copy link

Are there any linear algebra ops not implemented in MPS that you have made custom shaders for? Any shaders I could "borrow" from your project (with full credit) and use in my own? Specifically, it would be helpful to have SVD and reverse-mode Cholesky operators.

@albanD
Copy link
Collaborator Author

albanD commented May 18, 2022

Hey,

There are no custom shaders at the moment as everything we needed for the basic networks we looked at was already provided by MPS (or a set of ops in MPS). Also , required functions that are not in the hot path are simply falling back to CPU for now.

It is mentioned here as this is something that is possible to be done easily within the integration. But not something that is used today.

@pzelasko
Copy link

I was testing a bunch of speech synthesis and vocoder models, and found the following operators missing so far:

  • aten::flip
  • aten::equal
  • aten::upsample_nearest1d.out

@Linux-cpp-lisp
Copy link

One vote for a CPU fallback for torch.bincount.

Is there any reason, given the unified memory architecture, that every op not implemented on Metal cannot just fall back to the CPU implementation without memory copy operations? (Based, of course, on my 10,000ft view of the architecture, which I'm sure is wildly oversimplified.)

@richardburleigh
Copy link

richardburleigh commented May 19, 2022

Tip for everyone:

Run your script with PYTORCH_ENABLE_MPS_FALLBACK=1 which will fallback to the CPU.

I'm using a custom build which merges pull request #77791 so am not sure if this is included in the current build (Edit: It's not. You need to build PyTorch yourself with the pull request or trust an online build with it).

@gautierdag
Copy link

Testing with some huggingface transformers code: + 1 vote for aten::cumsum.out
Tried with the fallback env var but doesn't seem to work for me.

@lhoenig
Copy link
Contributor

lhoenig commented May 20, 2022

One missing op I ran into and haven't seen mentioned yet is aten::_unique2.
Edit: This error goes away when passing PYTORCH_ENABLE_MPS_FALLBACK=1 when using the current main branch build. However, instead I get warnings

The operator 'aten::nonzero' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at  /Users/lukas/pytorch/aten/src/ATen/mps/MPSFallback.mm:11.)

then

The dst MTL buffer in copy_to_mps is non-contiguous (Triggered internally at  /Users/lukas/pytorch/aten/src/ATen/native/mps/operations/Copy.mm:323.)

and finally the forward pass through my model crashes with

RuntimeError: Placeholder buffer size (7493632) is not large enough to contain the Tensor storage of size 14986944

On cpu it works fine. Could be #77886 I suppose.

@Willian-Zhang
Copy link

Testing with some huggingface transformers code: + 1 vote for aten::cumsum.out
Tried with the fallback env var but doesn't seem to work for me.

+1
setting PYTORCH_ENABLE_MPS_FALLBACK=1 still results in:

NotImplementedError: Could not run 'aten::cumsum.out' with arguments from the 'MPS' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::cumsum.out' is only available for these backends: [Dense, Conjugate, UNKNOWN_TENSOR_TYPE_ID, QuantizedXPU, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, SparseCPU, SparseCUDA, SparseHIP, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, SparseXPU, UNKNOWN_TENSOR_TYPE_ID, SparseVE, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, NestedTensorCUDA, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID].

CPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterCPU.cpp:37386 [kernel]
Meta: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterMeta.cpp:31637 [kernel]
BackendSelect: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Python: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:133 [backend fallback]
Named: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/NamedRegistrations.cpp:11 [kernel]
Conjugate: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/ConjugateFallback.cpp:18 [backend fallback]
Negative: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/NegateFallback.cpp:18 [backend fallback]
ZeroTensor: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/ZeroTensorFallback.cpp:86 [backend fallback]
ADInplaceOrView: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/ADInplaceOrViewType_1.cpp:3288 [kernel]
AutogradOther: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradCPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradCUDA: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradXLA: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradMPS: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradIPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradXPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradHPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradLazy: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradPrivateUse1: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradPrivateUse2: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradPrivateUse3: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
Tracer: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/TraceType_0.cpp:12585 [kernel]
AutocastCPU: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/autocast_mode.cpp:481 [backend fallback]
Autocast: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/autocast_mode.cpp:324 [backend fallback]
Batched: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/BatchingRegistrations.cpp:1064 [backend fallback]
VmapMode: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]
Functionalize: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterFunctionalization_3.cpp:12118 [kernel]
PythonTLSSnapshot: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:137 [backend fallback]

@albanD
Copy link
Collaborator Author

albanD commented May 20, 2022

@lhoenig could you open a new separate issue for the cpu fallback failing for you?
The error seems to hint at the fact that you're doing moving across device non-contiguous Tensor. Making sure they are might help as a workaround.
We can continue this discussion in the new issue you will create.

@Willian-Zhang the fallback is ONLY available if you build from source right now. It will be in the nightly build tomorrow (May 21st).

@weiji14
Copy link
Contributor

weiji14 commented May 20, 2022

Would like to add aten::_local_scalar_dense to the list. Also, is it possible to link to some examples in the top post on how we can implement these into Pytorch? I'd love to give it a shot if it's not too hard.

@lhoenig
Copy link
Contributor

lhoenig commented May 20, 2022

@albanD Yep, making the Tensors contiguous worked. But yet another issue revealed itself. I created #77977 and #78001.

@psobolewskiPhD
Copy link

psobolewskiPhD commented May 20, 2022

I've got a non supported op: aten::grid_sampler_2d

envs/pytorch-env/lib/python3.9/site-packages/torch/nn/functional.py:4172: UserWarning: The operator 'aten::grid_sampler_2d' is not currently supported on the MPS backend and will fall back to run on the CPU. This may performance implications. (Triggered internally at  /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:11.)
  return torch.grid_sampler(input, grid, mode_enum, padding_mode_enum, align_corners)

@thipokKub
Copy link

Not supported

  • aten::l1_loss_backward.grad_input
  • aten::kl_div_backward

Code

X, y = torch.rand(16, 10).to("mps"), torch.rand(16, 1).to("mps")
model = nn.Linear(10, 1).to("mps")
criterion = nn.L1Loss() # nn.KLDivLoss()
loss = criterion(model(X), y)
loss.backward()

Output

NotImplementedError: Could not run 'aten::l1_loss_backward.grad_input' with arguments from the 'MPS' backend

@tw-ilson
Copy link

Trying to use affine crop from torchvision, and found the operator aten::linspace.out does not seem to be implemented with the MPS backend

@nicolasbeglinger
Copy link

nicolasbeglinger commented May 22, 2022

Trying to use MPS backend with pytorch geometric, and found the operator aten::index.Tensor is not yet implemented.

@feesta
Copy link

feesta commented May 22, 2022

Found the operator 'aten::grid_sampler_2d' is not current implemented for the MPS device.

@mooey5775
Copy link

Would be great to add aten::adaptive_max_pool2d to the list - seems to be fairly common and for me useful in some point cloud architectures.

@RohanM
Copy link
Contributor

RohanM commented May 23, 2022

I ran into this error with aten::count_nonzero.dim_IntList (via torch.count_nonzero()). I'll take a look at implementing this op with MPS.

@kisisjrlly
Copy link

Voting for 'aten::linalg_lu_factor_ex.out'

@p-iosifidis
Copy link

Contributor

I have found that aten::slow_conv_transpose2d.out is not implemented for the MPS device.

Code

device = torch.device('')
z = torch.randn(25, 100, 1, 1).to(device)
out = gen(z)
show_tensor_images(out, num_images=25)
show_tensor_images(real, num_images=25, title='Real Images')

Error Message

NotImplementedError: The operator 'aten::slow_conv_transpose2d.out' is not current implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.

I'd also like to propose transpose_conv3d

@exdysa
Copy link

exdysa commented Feb 28, 2025

Added

Tip

If your error text is below this line, you don't need to vote. These have been added.

MacOS PyTorch is at _**2.8**_ Update your PyTorch.

pip3 install --upgrade --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu
aten::_adaptive_avg_pool2d
aten::_adaptive_avg_pool2d_backward
aten::_batch_norm_with_update
aten::_batch_norm_with_update.out
aten::_cdist_forward
aten::_convert_weight_to_int4pack
aten::_copy_from
aten::_copy_from_and_resize
aten::_efficientzerotensor
aten::_fft_c2c
aten::_fft_c2c.out
aten::_fft_c2r
aten::_fft_c2r.out
aten::_fft_r2c
aten::_fft_r2c.out
aten::_fused_adam
aten::_fused_adam.tensor_lr
aten::_fused_adamw
aten::_fused_adamw.tensor_lr
aten::_fused_sgd
aten::_fused_sgd.tensor_lr
aten::_histogramdd_bin_edges
aten::_histogramdd_from_bin_cts
aten::_histogramdd_from_bin_tensors
aten::_index_put_impl
aten::_local_scalar_dense
aten::_log_softmax_backward_data.out
aten::_log_softmax.out
aten::_lstm_mps
aten::_mps_convolution
aten::_mps_convolution_transpose
aten::_native_batch_norm_legit
aten::_native_batch_norm_legit.no_stats
aten::_native_batch_norm_legit.no_stats_out
aten::_native_batch_norm_legit.out
aten::_prelu_kernel
aten::_prelu_kernel_backward
aten::_reshape_alias
aten::_scaled_dot_product_attention_math_for_mps
aten::_softmax_backward_data.out
aten::_softmax.out
aten::_unique2
aten::_upsample_nearest_exact1d_backward.grad_input
aten::_upsample_nearest_exact1d.out
aten::_upsample_nearest_exact2d_backward.grad_input
aten::_upsample_nearest_exact2d.out
aten::_weight_int4pack_mm
aten::_weight_int8pack_mm
aten::_weight_norm_interface
aten::_weight_norm_interface_backward
aten::abs.out
aten::acos.out
aten::acosh.out
aten::adaptive_avg_pool2d.out
aten::adaptive_max_pool2d_backward.grad_input
aten::adaptive_max_pool2d.out
aten::add.out
aten::addbmm
aten::addbmm.out
aten::addcdiv.out
aten::addcmul.out
aten::addmm.out
aten::addmv.out
aten::addr
aten::addr.out
aten::all.all_out
aten::all.out
aten::amax.out
aten::amin.out
aten::aminmax.out
aten::any.all_out
aten::any.out
aten::arange.start_out
aten::argmax.out
aten::argmin.out
aten::as_strided
aten::asin.out
aten::asinh.out
aten::atan.out
aten::atan2.out
aten::atanh.out
aten::avg_pool2d_backward.grad_input
aten::avg_pool2d.out
aten::baddbmm.out
aten::batch_norm_backward
aten::bernoulli.float
aten::bernoulli.out
aten::bernoulli.Tensor
aten::binary_cross_entropy
aten::binary_cross_entropy_backward
aten::binary_cross_entropy_backward.grad_input
aten::binary_cross_entropy.out
aten::bincount
aten::bitwise_and.Tensor_out
aten::bitwise_left_shift.Tensor_out
aten::bitwise_not.out
aten::bitwise_or.Tensor_out
aten::bitwise_right_shift.Tensor_out
aten::bitwise_xor.Tensor_out
aten::bmm.out
aten::bucketize.Scalar
aten::bucketize.Tensor
aten::bucketize.Tensor_out
aten::cat.out
aten::ceil.out
aten::clamp_max.out
aten::clamp_max.Tensor_out
aten::clamp_min.out
aten::clamp_min.Tensor_out
aten::clamp.out
aten::clamp.Tensor_out
aten::complex.out
aten::conj_physical.out
aten::constant_pad_nd
aten::copysign.out
aten::cos.out
aten::cosh.out
aten::count_nonzero.dim_IntList
aten::cumprod.out
aten::cumsum.out
aten::digamma.out
aten::div.out
aten::div.out_mode
aten::dot
aten::elu_backward.grad_input
aten::elu.out
aten::embedding_dense_backward
aten::empty_strided
aten::empty.memory_format
aten::eq.Scalar_out
aten::eq.Tensor_out
aten::equal
aten::erf.out
aten::erfinv.out
aten::exp.out
aten::exp2.out
aten::expm1.out
aten::exponential
aten::eye.m_out
aten::eye.out
aten::fill.Scalar
aten::fill.Tensor
aten::flip
aten::floor_divide
aten::floor_divide.out
aten::floor_divide.Tensor
aten::floor.out
aten::fmax.out
aten::fmin.out
aten::fmod.Tensor_out
aten::frac.out
aten::gather.out
aten::ge.Scalar_out
aten::ge.Tensor_out
aten::gelu_backward.grad_input
aten::gelu.out
aten::glu_backward
aten::glu_backward.grad_input
aten::glu.out
aten::grid_sampler_2d
aten::gt.Scalar_out
aten::gt.Tensor_out
aten::hardsigmoid_backward.grad_input
aten::hardsigmoid.out
aten::hardswish
aten::hardswish_backward
aten::hardswish.out
aten::hardtanh
aten::hardtanh_backward
aten::hardtanh_backward.grad_input
aten::hardtanh.out
aten::histc
aten::histc.out
aten::histogram.bin_ct
aten::histogram.bin_ct_out
aten::histogram.bins_tensor
aten::histogram.bins_tensor_out
aten::huber_loss
aten::huber_loss_backward.out
aten::huber_loss.out
aten::hypot.out
aten::i0.out
aten::im2col
aten::im2col.out
aten::index_add.out
aten::index_fill.int_Scalar
aten::index_fill.int_Tensor
aten::index_select
aten::index_select.out
aten::index.Tensor_out
aten::is_set_to
aten::isin.Tensor_Tensor_out
aten::isnan
aten::isneginf.out
aten::isposinf.out
aten::le.Scalar_out
aten::le.Tensor_out
aten::leaky_relu_backward.grad_input
aten::leaky_relu.out
aten::lerp.Scalar_out
aten::lerp.Tensor_out
aten::lgamma.out
aten::linalg_cross.out
aten::linalg_inv_ex.inverse
aten::linalg_lu_factor
aten::linalg_lu_factor.out
aten::linalg_solve_triangular
aten::linalg_solve_triangular.out
aten::linalg_vector_norm.out
aten::linear
aten::linear_backward
aten::linspace.out
aten::log_sigmoid_backward
aten::log_sigmoid_backward.grad_input
aten::log_sigmoid_forward
aten::log_sigmoid_forward.output
aten::log.out
aten::log10.out
aten::log1p.out
aten::log2.out
aten::logaddexp.out
aten::logaddexp2.out
aten::logical_and.out
aten::logical_not.out
aten::logical_or.out
aten::logical_xor.out
aten::logit
aten::logit_backward.grad_input
aten::logit.out
aten::lshift.Scalar
aten::lshift.Tensor
aten::lstm_mps_backward
aten::lt.Scalar_out
aten::lt.Tensor_out
aten::masked_fill.Scalar
aten::masked_fill.Tensor
aten::masked_scatter
aten::masked_select
aten::masked_select.out
aten::max
aten::max_pool2d
aten::max_pool2d_backward
aten::max_pool2d_with_indices_backward.grad_input
aten::max_pool2d_with_indices.out
aten::max.dim_max
aten::maximum.out
aten::mean.out
aten::median
aten::median.dim_values
aten::min
aten::min.dim_min
aten::minimum.out
aten::mish_backward
aten::mish.out
aten::mm.out
aten::mps_convolution_backward
aten::mps_convolution_transpose_backward
aten::mse_loss_backward
aten::mse_loss_backward.grad_input
aten::mse_loss.out
aten::mul.out
aten::multinomial
aten::multinomial.out
aten::nan_to_num.out
aten::nansum
aten::nansum.out
aten::native_batch_norm
aten::native_batch_norm_backward
aten::native_batch_norm.out
aten::native_group_norm
aten::native_group_norm_backward
aten::native_layer_norm
aten::native_layer_norm_backward
aten::ne.Scalar_out
aten::ne.Tensor_out
aten::neg.out
aten::nextafter.out
aten::nll_loss_backward.grad_input
aten::nll_loss_forward.output
aten::nll_loss2d_backward
aten::nll_loss2d_backward.grad_input
aten::nll_loss2d_forward
aten::nll_loss2d_forward.output
aten::nonzero
aten::nonzero.out
aten::norm.dtype_out
aten::norm.out
aten::normal
aten::normal.float_Tensor
aten::normal.float_Tensor_out
aten::normal.Tensor_float
aten::normal.Tensor_float_out
aten::normal.Tensor_Tensor
aten::normal.Tensor_Tensor_out
aten::permute
aten::pixel_shuffle
aten::pixel_unshuffle
aten::polar.out
aten::polygamma.out
aten::pow.Scalar_out
aten::pow.Tensor_Scalar_out
aten::pow.Tensor_Tensor_out
aten::prod
aten::prod.int_out
aten::random
aten::random.from
aten::random.to
aten::randperm.generator_out
aten::range.out
aten::reciprocal.out
aten::reflection_pad1d_backward.grad_input
aten::reflection_pad1d.out
aten::reflection_pad2d
aten::reflection_pad2d_backward
aten::reflection_pad2d_backward.grad_input
aten::reflection_pad2d.out
aten::reflection_pad3d_backward.grad_input
aten::reflection_pad3d.out
aten::relu
aten::remainder.Scalar_Tensor
aten::remainder.Tensor_out
aten::renorm.out
aten::repeat
aten::repeat_interleave.Tensor
aten::replication_pad1d_backward.grad_input
aten::replication_pad1d.out
aten::replication_pad2d_backward
aten::replication_pad2d_backward.grad_input
aten::replication_pad2d.out
aten::replication_pad3d_backward
aten::replication_pad3d_backward.grad_input
aten::replication_pad3d.out
aten::resize
aten::roll
aten::round.out
aten::rshift.Scalar
aten::rshift.Tensor
aten::rsqrt.out
aten::scatter_add.out
aten::scatter_reduce.two_out
aten::scatter.reduce_out
aten::scatter.src_out
aten::scatter.value_out
aten::scatter.value_reduce_out
aten::searchsorted.Scalar
aten::searchsorted.Scalar_out
aten::searchsorted.Tensor
aten::searchsorted.Tensor_out
aten::set
aten::set.source_Storage
aten::set.source_Storage_storage_offset
aten::set.source_Tensor
aten::sgn.out
aten::sigmoid_backward.grad_input
aten::sigmoid.out
aten::sign.out
aten::signbit.out
aten::silu_backward.grad_input
aten::silu.out
aten::sin.out
aten::sinh.out
aten::smooth_l1_loss_backward.grad_input
aten::smooth_l1_loss.out
aten::softplus_backward.grad_input
aten::softplus.out
aten::softshrink_backward.grad_input
aten::softshrink.out
aten::sort.values_stable
aten::special_i1.out
aten::special_spherical_bessel_j0.out
aten::sqrt.out
aten::std_mean.correction
aten::std.correction
aten::sub.out
aten::sum.IntList_out
aten::tan.out
aten::tanh_backward.grad_input
aten::tanh.out
aten::threshold_backward.grad_input
aten::threshold.out
aten::topk.values
aten::trace
aten::triangular_solve.X
aten::tril_indices
aten::tril.out
aten::triu_indices
aten::triu.out
aten::trunc.out
aten::unfold
aten::unfold_backward
aten::uniform
aten::unique_consecutive
aten::unique_dim_consecutive
aten::upsample_bicubic2d_backward.grad_input
aten::upsample_bicubic2d.out
aten::upsample_bilinear2d_backward.grad_input
aten::upsample_bilinear2d.out
aten::upsample_linear1d_backward.grad_input
aten::upsample_linear1d.out
aten::upsample_nearest1d_backward.grad_input
aten::upsample_nearest1d.out
aten::upsample_nearest2d_backward.grad_input
aten::upsample_nearest2d.out
aten::var_mean.correction
aten::var.correction
aten::view
aten::view_as_complex
aten::view_as_real
aten::where.self
aten::where.self_out
aten::xlogy.OutTensor
aten::zero

Tip

If your error text is above this line, you don't need to vote. These have been added.

MacOS PyTorch is at _**2.8**_ Update your PyTorch.

pip3 install --upgrade --pre torch torchvision torchaudio --index-url > https://download.pytorch.org/whl/nightly/cpu

MacOS PyTorch is at _**2.8**_

Not Yet Added

Important

If you are below this line, the operation has 15-72 votes and is in the work queue. Voting won't speed up the queue. You have to wait, or do code & math & stuff to help move the queue along.

You can also try:

os.environ['PYTORCH_ENABLE_MPS_FALLBACK']='1'
aten::_ctc_loss
aten::_embedding_bag
aten::_embedding_bag_dense_backward
aten::_embedding_bag_forward_only
aten::_embedding_bag_per_sample_weights_backward
aten::_fused_sdp_choice
aten::_linalg_eigh.eigenvalues
aten::_linalg_eigvals
aten:: linalg_lu_factor_ex.out
aten::_linalg_solve_ex.result>>>>Assignees: jhavukainen
aten::_linalg_svd_out
aten::_logcumsumexp>>>>Assignees: jhavukainen
aten::_masked_softmax
aten::_masked_softmax_backward
aten::_nested_from_padded
aten::_nested_tensor_from_mask
aten::_nested_tensor_from_mask_left_aligned
aten::_nested_tensor_size
aten::_nested_tensor_strides
aten::_nested_tensor_storage_offsets
aten::_sample_dirichlet
aten::_segment_reduce_backward
aten::_slow_conv2d_forward
aten::_standard_gamma
aten::_standard_gamma_grad
aten::_symeig_helper
aten::_upsample_bicubic2d_aa_backward.grad_input
aten::_upsample_bicubic2d_aa.out
aten::adaptive_avg_pool3d_backward.grad_input
aten::adaptive_avg_pool3d.out
aten::angle
aten::angle.out
aten::avg_pool3d_backward.grad_input
aten::bitwise_left_shift_out
aten::cholesky_inverse.out>>>>Assignees: jhavukainen
aten::cholesky_solve.out
aten::cholesky.out
aten::cummax.out
aten::cummin.out
aten::embedding_renorm_
aten::grid_sampler_2d_backward
aten::grid_sampler_3d
aten::grid_sampler_3d_backward
aten::hardshrink_backward.grad_input
aten::hardshrink.out
aten::igamma.out
aten::igammac.out
aten::kthvalue.values
aten::linalg_cholesky_ex>>>>Assignees: jhavukainen
aten::linalg_eig
aten::linalg_eig.out
aten::linalg_householder_product
aten::linalg_inv_out_helper
aten::linalg_lstsq.out
aten::linalg_lu.out
aten::linalg_matrix_exp
aten::linalg_qr.out
aten::log_normal_
aten::max_unpool2d>>>>Assignees: skotapati
aten::max_unpool3d
aten::multilabel_margin_loss_forward
aten::mvlgamma.out
aten::native_group_norm
aten::native_group_norm_backward
aten::segment_reduce
aten::slow_conv_transpose2d.out
aten::slow_conv3d_forward
aten::unique_dim
c10d::allgather_
c10d::allreduce_
c10d::broadcast
max_pool3d
max_pool3d_with_indices
nn.Conv3D (S)
torchvision::deform_conv2d
[below are mps support matrix False but not tracker?]
aten::_linalg_det.result
aten::_upsample_bilinear2d_aa.out
aten::lu_unpack.out
aten::round.decimals_out
aten::sinc.out
aten::special_entr.out
aten::special_xlog1py.out
aten::special_zeta.out

Important

If you are above this line, the operation has 15-72 votes and is in the work queue. Voting won't speed up the queue. You have to wait, or do code & math & stuff to help move the queue along

You can also try:

os.environ['PYTORCH_ENABLE_MPS_FALLBACK']='1'

Current Queue:

85

do code & math & stuff to help move the queue along

Warning

Not Yet Added Or Queued

Active developers read this section carefully. I'm not sure how many of these are genuine, vs mistakes in tracking, mistakes in updating, typos, etc..:

14 votes aten::_index_put_impl_
14 votes aten::native_dropout
11 votes aten::nanmedian.dim_values
8 votes aten::_linalg_slogdet.sign
5 votes aten::kl_div_backward
3 votes aten::quantize_per_tensor
3 votes torchaudio::forced_align
1 vote aten::transpose_conv3d
1 vote aten::avg_pool3d.out
1 vote aten::upsample_nearest3d.vec (my humble request q_q)

From what I can tell, voting at this point is meaningless since no issues are moving and traffic is growing. Any volunteers to help?

Source : tracker, queue

Important

MacOS PyTorch is at _**2.8**_ Update your PyTorch.

pip3 install --upgrade --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu

Note

Please read the above before commenting ^^^^^^^

@rangetsushigure
Copy link

NotImplementedError: The operator 'aten::unfold_backward' is not currently implemented for the MPS device.

@Mariadjb
Copy link

Mariadjb commented Mar 8, 2025

The operator 'aten::scatter_reduce.two_out' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on #77764. As a temporary fix, you can set the environment variable PYTORCH_ENABLE_MPS_FALLBACK=1 to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.

@exdysa
Copy link

exdysa commented Mar 8, 2025

NotImplementedError: The operator 'aten::unfold_backward' is not currently implemented for the MPS device.

it is, update your pytorch

The operator 'aten::scatter_reduce.two_out' is not currently implemented

it is, update your pytorch

Added

Tip

If your error text is below this line, you don't need to vote. These have been added.

MacOS PyTorch is at _**2.8**_ Update your PyTorch.

pip install --upgrade --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu
aten::_adaptive_avg_pool2d
aten::_adaptive_avg_pool2d_backward
aten::_batch_norm_with_update
aten::_batch_norm_with_update.out
aten::_cdist_forward
aten::_convert_weight_to_int4pack
aten::_copy_from
aten::_copy_from_and_resize
aten::_efficientzerotensor
aten::_fft_c2c
aten::_fft_c2c.out
aten::_fft_c2r
aten::_fft_c2r.out
aten::_fft_r2c
aten::_fft_r2c.out
aten::_fused_adam
aten::_fused_adam.tensor_lr
aten::_fused_adamw
aten::_fused_adamw.tensor_lr
aten::_fused_sgd
aten::_fused_sgd.tensor_lr
aten::_histogramdd_bin_edges
aten::_histogramdd_from_bin_cts
aten::_histogramdd_from_bin_tensors
aten::_index_put_impl
aten::_local_scalar_dense
aten::_log_softmax_backward_data.out
aten::_log_softmax.out
aten::_lstm_mps
aten::_mps_convolution
aten::_mps_convolution_transpose
aten::_native_batch_norm_legit
aten::_native_batch_norm_legit.no_stats
aten::_native_batch_norm_legit.no_stats_out
aten::_native_batch_norm_legit.out
aten::_prelu_kernel
aten::_prelu_kernel_backward
aten::_reshape_alias
aten::_scaled_dot_product_attention_math_for_mps
aten::_softmax_backward_data.out
aten::_softmax.out
aten::_unique2
aten::_upsample_nearest_exact1d_backward.grad_input
aten::_upsample_nearest_exact1d.out
aten::_upsample_nearest_exact2d_backward.grad_input
aten::_upsample_nearest_exact2d.out
aten::_weight_int4pack_mm
aten::_weight_int8pack_mm
aten::_weight_norm_interface
aten::_weight_norm_interface_backward
aten::abs.out
aten::acos.out
aten::acosh.out
aten::adaptive_avg_pool2d.out
aten::adaptive_max_pool2d_backward.grad_input
aten::adaptive_max_pool2d.out
aten::add.out
aten::addbmm
aten::addbmm.out
aten::addcdiv.out
aten::addcmul.out
aten::addmm.out
aten::addmv.out
aten::addr
aten::addr.out
aten::all.all_out
aten::all.out
aten::amax.out
aten::amin.out
aten::aminmax.out
aten::any.all_out
aten::any.out
aten::arange.start_out
aten::argmax.out
aten::argmin.out
aten::as_strided
aten::asin.out
aten::asinh.out
aten::atan.out
aten::atan2.out
aten::atanh.out
aten::avg_pool2d_backward.grad_input
aten::avg_pool2d.out
aten::baddbmm.out
aten::batch_norm_backward
aten::bernoulli.float
aten::bernoulli.out
aten::bernoulli.Tensor
aten::binary_cross_entropy
aten::binary_cross_entropy_backward
aten::binary_cross_entropy_backward.grad_input
aten::binary_cross_entropy.out
aten::bincount
aten::bitwise_and.Tensor_out
aten::bitwise_left_shift.Tensor_out
aten::bitwise_not.out
aten::bitwise_or.Tensor_out
aten::bitwise_right_shift.Tensor_out
aten::bitwise_xor.Tensor_out
aten::bmm.out
aten::bucketize.Scalar
aten::bucketize.Tensor
aten::bucketize.Tensor_out
aten::cat.out
aten::ceil.out
aten::clamp_max.out
aten::clamp_max.Tensor_out
aten::clamp_min.out
aten::clamp_min.Tensor_out
aten::clamp.out
aten::clamp.Tensor_out
aten::complex.out
aten::conj_physical.out
aten::constant_pad_nd
aten::copysign.out
aten::cos.out
aten::cosh.out
aten::count_nonzero.dim_IntList
aten::cumprod.out
aten::cumsum.out
aten::digamma.out
aten::div.out
aten::div.out_mode
aten::dot
aten::elu_backward.grad_input
aten::elu.out
aten::embedding_dense_backward
aten::empty_strided
aten::empty.memory_format
aten::eq.Scalar_out
aten::eq.Tensor_out
aten::equal
aten::erf.out
aten::erfinv.out
aten::exp.out
aten::exp2.out
aten::expm1.out
aten::exponential
aten::eye.m_out
aten::eye.out
aten::fill.Scalar
aten::fill.Tensor
aten::flip
aten::floor_divide
aten::floor_divide.out
aten::floor_divide.Tensor
aten::floor.out
aten::fmax.out
aten::fmin.out
aten::fmod.Tensor_out
aten::frac.out
aten::gather.out
aten::ge.Scalar_out
aten::ge.Tensor_out
aten::gelu_backward.grad_input
aten::gelu.out
aten::glu_backward
aten::glu_backward.grad_input
aten::glu.out
aten::grid_sampler_2d
aten::gt.Scalar_out
aten::gt.Tensor_out
aten::hardsigmoid_backward.grad_input
aten::hardsigmoid.out
aten::hardswish
aten::hardswish_backward
aten::hardswish.out
aten::hardtanh
aten::hardtanh_backward
aten::hardtanh_backward.grad_input
aten::hardtanh.out
aten::histc
aten::histc.out
aten::histogram.bin_ct
aten::histogram.bin_ct_out
aten::histogram.bins_tensor
aten::histogram.bins_tensor_out
aten::huber_loss
aten::huber_loss_backward.out
aten::huber_loss.out
aten::hypot.out
aten::i0.out
aten::im2col
aten::im2col.out
aten::index_add.out
aten::index_fill.int_Scalar
aten::index_fill.int_Tensor
aten::index_select
aten::index_select.out
aten::index.Tensor_out
aten::is_set_to
aten::isin.Tensor_Tensor_out
aten::isnan
aten::isneginf.out
aten::isposinf.out
aten::le.Scalar_out
aten::le.Tensor_out
aten::leaky_relu_backward.grad_input
aten::leaky_relu.out
aten::lerp.Scalar_out
aten::lerp.Tensor_out
aten::lgamma.out
aten::linalg_cholesky_ex
aten::linalg_cross.out
aten::linalg_inv_ex.inverse
aten::linalg_lu_factor
aten::linalg_lu_factor.out
aten::linalg_solve_triangular
aten::linalg_solve_triangular.out
aten::linalg_vector_norm.out
aten::linear
aten::linear_backward
aten::linspace.out
aten::log_sigmoid_backward
aten::log_sigmoid_backward.grad_input
aten::log_sigmoid_forward
aten::log_sigmoid_forward.output
aten::log.out
aten::log10.out
aten::log1p.out
aten::log2.out
aten::logaddexp.out
aten::logaddexp2.out
aten::logical_and.out
aten::logical_not.out
aten::logical_or.out
aten::logical_xor.out
aten::logit
aten::logit_backward.grad_input
aten::logit.out
aten::lshift.Scalar
aten::lshift.Tensor
aten::lstm_mps_backward
aten::lt.Scalar_out
aten::lt.Tensor_out
aten::masked_fill.Scalar
aten::masked_fill.Tensor
aten::masked_scatter
aten::masked_select
aten::masked_select.out
aten::max
aten::max_pool2d
aten::max_pool2d_backward
aten::max_pool2d_with_indices_backward.grad_input
aten::max_pool2d_with_indices.out
aten::max.dim_max
aten::maximum.out
aten::mean.out
aten::median
aten::median.dim_values
aten::min
aten::min.dim_min
aten::minimum.out
aten::mish_backward
aten::mish.out
aten::mm.out
aten::mps_convolution_backward
aten::mps_convolution_transpose_backward
aten::mse_loss_backward
aten::mse_loss_backward.grad_input
aten::mse_loss.out
aten::mul.out
aten::multinomial
aten::multinomial.out
aten::nan_to_num.out
aten::nansum
aten::nansum.out
aten::native_batch_norm
aten::native_batch_norm_backward
aten::native_batch_norm.out
aten::native_group_norm
aten::native_group_norm_backward
aten::native_layer_norm
aten::native_layer_norm_backward
aten::ne.Scalar_out
aten::ne.Tensor_out
aten::neg.out
aten::nextafter.out
aten::nll_loss_backward.grad_input
aten::nll_loss_forward.output
aten::nll_loss2d_backward
aten::nll_loss2d_backward.grad_input
aten::nll_loss2d_forward
aten::nll_loss2d_forward.output
aten::nonzero
aten::nonzero.out
aten::norm.dtype_out
aten::norm.out
aten::normal
aten::normal.float_Tensor
aten::normal.float_Tensor_out
aten::normal.Tensor_float
aten::normal.Tensor_float_out
aten::normal.Tensor_Tensor
aten::normal.Tensor_Tensor_out
aten::permute
aten::pixel_shuffle
aten::pixel_unshuffle
aten::polar.out
aten::polygamma.out
aten::pow.Scalar_out
aten::pow.Tensor_Scalar_out
aten::pow.Tensor_Tensor_out
aten::prod
aten::prod.int_out
aten::random
aten::random.from
aten::random.to
aten::randperm.generator_out
aten::range.out
aten::reciprocal.out
aten::reflection_pad1d_backward.grad_input
aten::reflection_pad1d.out
aten::reflection_pad2d
aten::reflection_pad2d_backward
aten::reflection_pad2d_backward.grad_input
aten::reflection_pad2d.out
aten::reflection_pad3d_backward.grad_input
aten::reflection_pad3d.out
aten::relu
aten::remainder.Scalar_Tensor
aten::remainder.Tensor_out
aten::renorm.out
aten::repeat
aten::repeat_interleave.Tensor
aten::replication_pad1d_backward.grad_input
aten::replication_pad1d.out
aten::replication_pad2d_backward
aten::replication_pad2d_backward.grad_input
aten::replication_pad2d.out
aten::replication_pad3d_backward
aten::replication_pad3d_backward.grad_input
aten::replication_pad3d.out
aten::resize
aten::roll
aten::round.out
aten::rshift.Scalar
aten::rshift.Tensor
aten::rsqrt.out
aten::scatter_add.out
aten::scatter_reduce.two_out
aten::scatter.reduce_out
aten::scatter.src_out
aten::scatter.value_out
aten::scatter.value_reduce_out
aten::searchsorted.Scalar
aten::searchsorted.Scalar_out
aten::searchsorted.Tensor
aten::searchsorted.Tensor_out
aten::set
aten::set.source_Storage
aten::set.source_Storage_storage_offset
aten::set.source_Tensor
aten::sgn.out
aten::sigmoid_backward.grad_input
aten::sigmoid.out
aten::sign.out
aten::signbit.out
aten::silu_backward.grad_input
aten::silu.out
aten::sin.out
aten::sinh.out
aten::smooth_l1_loss_backward.grad_input
aten::smooth_l1_loss.out
aten::softplus_backward.grad_input
aten::softplus.out
aten::softshrink_backward.grad_input
aten::softshrink.out
aten::sort.values_stable
aten::special_i1.out
aten::special_spherical_bessel_j0.out
aten::sqrt.out
aten::std_mean.correction
aten::std.correction
aten::sub.out
aten::sum.IntList_out
aten::tan.out
aten::tanh_backward.grad_input
aten::tanh.out
aten::threshold_backward.grad_input
aten::threshold.out
aten::topk.values
aten::trace
aten::triangular_solve.X
aten::tril_indices
aten::tril.out
aten::triu_indices
aten::triu.out
aten::trunc.out
aten::unfold
aten::unfold_backward
aten::uniform
aten::unique_consecutive
aten::unique_dim_consecutive
aten::upsample_bicubic2d_backward.grad_input
aten::upsample_bicubic2d.out
aten::upsample_bilinear2d_backward.grad_input
aten::upsample_bilinear2d.out
aten::upsample_linear1d_backward.grad_input
aten::upsample_linear1d.out
aten::upsample_nearest1d_backward.grad_input
aten::upsample_nearest1d.out
aten::upsample_nearest2d_backward.grad_input
aten::upsample_nearest2d.out
aten::var_mean.correction
aten::var.correction
aten::view
aten::view_as_complex
aten::view_as_real
aten::where.self
aten::where.self_out
aten::xlogy.OutTensor
aten::zero

Tip

If your error text is above this line, you don't need to vote. These have been added.

MacOS PyTorch is at _**2.8**_ Update your PyTorch.

pip3 install --upgrade --pre torch torchvision torchaudio --index-url > https://download.pytorch.org/whl/nightly/cpu

MacOS PyTorch is at _**2.8**_

Not Yet Added

Important

If you are below this line, the operation has 15-72 votes and is in the work queue. Voting won't speed up the queue. You have to wait, or do code & math & stuff to help move the queue along.

You can also try:

os.environ['PYTORCH_ENABLE_MPS_FALLBACK']='1'
aten::_ctc_loss
aten::_embedding_bag
aten::_embedding_bag_dense_backward
aten::_embedding_bag_forward_only
aten::_embedding_bag_per_sample_weights_backward
aten::_fused_sdp_choice
aten::_linalg_eigh.eigenvalues
aten::_linalg_eigvals
aten:: linalg_lu_factor_ex.out
aten::_linalg_solve_ex.result>>>>Assignees: jhavukainen
aten::_linalg_svd_out
aten::_logcumsumexp>>>>Assignees: jhavukainen
aten::_masked_softmax
aten::_masked_softmax_backward
aten::_nested_from_padded
aten::_nested_tensor_from_mask
aten::_nested_tensor_from_mask_left_aligned
aten::_nested_tensor_size
aten::_nested_tensor_strides
aten::_nested_tensor_storage_offsets
aten::_sample_dirichlet
aten::_segment_reduce_backward
aten::_slow_conv2d_forward
aten::_standard_gamma
aten::_standard_gamma_grad
aten::_symeig_helper
aten::_upsample_bicubic2d_aa_backward.grad_input
aten::_upsample_bicubic2d_aa.out
aten::adaptive_avg_pool3d_backward.grad_input
aten::adaptive_avg_pool3d.out
aten::angle
aten::angle.out
aten::avg_pool3d_backward.grad_input
aten::bitwise_left_shift_out
aten::cholesky_inverse.out>>>>Assignees: jhavukainen
aten::cholesky_solve.out
aten::cholesky.out
aten::cummax.out
aten::cummin.out
aten::embedding_renorm_
aten::grid_sampler_2d_backward
aten::grid_sampler_3d
aten::grid_sampler_3d_backward
aten::hardshrink_backward.grad_input
aten::hardshrink.out
aten::igamma.out
aten::igammac.out
aten::kthvalue.values
aten::linalg_eig
aten::linalg_eig.out
aten::linalg_householder_product
aten::linalg_inv_out_helper
aten::linalg_lstsq.out
aten::linalg_lu.out
aten::linalg_matrix_exp
aten::linalg_qr.out
aten::log_normal_
aten::max_unpool2d>>>>Assignees: skotapati
aten::max_unpool3d
aten::multilabel_margin_loss_forward
aten::mvlgamma.out
aten::native_group_norm
aten::native_group_norm_backward
aten::segment_reduce
aten::slow_conv_transpose2d.out
aten::slow_conv3d_forward
aten::unique_dim
c10d::allgather_
c10d::allreduce_
c10d::broadcast
max_pool3d
max_pool3d_with_indices
nn.Conv3D (S)
torchvision::deform_conv2d
[below are mps support matrix False but not tracker?]
aten::_linalg_det.result
aten::_upsample_bilinear2d_aa.out
aten::lu_unpack.out
aten::round.decimals_out
aten::sinc.out
aten::special_entr.out
aten::special_xlog1py.out
aten::special_zeta.out

Important

If you are above this line, the operation has 15-72 votes and is in the work queue. Voting won't speed up the queue. You have to wait, or do code & math & stuff to help move the queue along

You can also try:

os.environ['PYTORCH_ENABLE_MPS_FALLBACK']='1'

Current Queue:

85

do code & math & stuff to help move the queue along

Warning

Not Yet Added Or Queued

Active developers read this section carefully. I'm not sure how many of these are genuine, vs mistakes in tracking, mistakes in updating, typos, etc..:

14 votes aten::_index_put_impl_
14 votes aten::native_dropout
11 votes aten::nanmedian.dim_values
8 votes aten::_linalg_slogdet.sign
5 votes aten::kl_div_backward
3 votes aten::quantize_per_tensor
3 votes torchaudio::forced_align
1 vote aten::transpose_conv3d
1 vote aten::avg_pool3d.out
1 vote aten::upsample_nearest3d.vec (my humble request q_q)

From what I can tell, voting at this point is meaningless since no issues are moving and traffic is growing. Any volunteers to help?

Source : tracker, queue

Important

MacOS PyTorch is at _**2.8**_ Update your PyTorch.

pip install --upgrade --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu

Please read the above before commenting ^^^^^^^

@devbanu
Copy link

devbanu commented Mar 17, 2025

How much work is it to actually implement something like aten::avg_pool3d.out?

Can one do it by generalizing from the existing implementation of aten::avg_pool2d.out or follow some other similar implementation already done?

I suspect there is a snag, otherwise it would have been implemented.

@Fax0rz0r
Copy link

Hello,
I'm using Python 3.9 on MacbookPro Intel 2.4 i9 MacOs Sonoma 14.3
Trying to run DepthAnything V2

Had an issue with 'aten::upsample_bicubic2d.out' and then did the PyTorch update via link above, but error stil exists.

If I'm making PYTORCH_ENABLE_MPS_FALLBACK=1 it runs, but I'm getting an error RuntimeError: Invalid buffer size: 3.54 GB with quality I need to get.

How it can be fixed?
Terminal log is in the attachment.
thx

MPS_error.txt

@exdysa
Copy link

exdysa commented Mar 19, 2025

MPS_error.txt

Requirement already satisfied: torch in /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages (2.2.2)

your torch is v 2.2. Not 2.7

Firstly, Install a newer python don't use or upgrade the Python provided with the Mac. Secondly, Use a venv. That way you can keep torch and other components separate from the installation, and if you make a mistake you don't mess up the Python installation. where python will be helpful to know which one youre using (the top). Then you can try again

pytorchmergebot pushed a commit that referenced this issue Mar 24, 2025
Third most voted op from #77764

Tests were deleted because they are covered by the regular test_output_match tests so those were redundant and were added in the last PR before the nanmedian dim version would be implemented

Pull Request resolved: #149680
Approved by: https://github.com/malfet

Co-authored-by: Nikita Shulga <[email protected]>
@MarcusEddie
Copy link

Hi all,
I hava met this issue:
[rank0]: NotImplementedError: The operator **'c10d::broadcast_' is not currently implemented for the MPS device**. If you want this op to be considered for addition please comment on https://github.com/pytorch/pytorch/issues/141287 and mention use-case, that resulted in missing op as well as commit hash 2236df1770800ffea5697b11b0bb0d910b2e59e1. As a temporary fix, you can set the environment variable "PYTORCH_ENABLE_MPS_FALLBACK=1" to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.

My code runs on the MBP with M4 chips,
the following are the packages i am using:

Name Version Build Channel
torch 2.6.0 pypi_0 pypi
torchaudio 2.6.0 pypi_0 pypi
torchvision 0.21.0 pypi_0 pypi

does this issue be fixed now?

thanks

@exdysa
Copy link

exdysa commented Apr 16, 2025

Hi all, I hava met this issue: `[rank0]: NotImplementedError: The operator **'c10d::broadcast_'
does this issue be fixed now?

Please try again with torch 2.8

pip install --upgrade --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu

if not, you can try PYTORCH_ENABLE_MPS_FALLBACK, otherwise you will have to code a workaround

amathewc pushed a commit to amathewc/pytorch that referenced this issue Apr 17, 2025
Third most voted op from pytorch#77764

Tests were deleted because they are covered by the regular test_output_match tests so those were redundant and were added in the last PR before the nanmedian dim version would be implemented

Pull Request resolved: pytorch#149680
Approved by: https://github.com/malfet

Co-authored-by: Nikita Shulga <[email protected]>
@malfet
Copy link
Contributor

malfet commented Apr 17, 2025

[rank0]: NotImplementedError: The operator **'c10d::broadcast_' is not currently implemented for the MPS

@MarcusEddie can you create an issue about it. And may be with minimal reproducer? I.e. trying to understand why you need a distributed op on a single GPU system

@forkyguo
Copy link

forkyguo commented Apr 24, 2025

NotImplementedError: The operator 'torchvision::nms' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on #77764. As a temporary fix, you can set the environment variable PYTORCH_ENABLE_MPS_FALLBACK=1 to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.

@dbl001
Copy link

dbl001 commented Apr 25, 2025

The operator 'aten::linalg_svd' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications.

@hanqin
Copy link

hanqin commented Apr 26, 2025

NotImplementedError: The operator 'aten::_linalg_solve_ex.result' is not currently implemented for the MPS device.

@jlchereau
Copy link

jlchereau commented Apr 26, 2025

After

pip3 install --upgrade --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu

Got

NotImplementedError: The operator 'aten::linalg_cholesky_ex.L' is not currently implemented for the MPS device.

os.environ['PYTORCH_ENABLE_MPS_FALLBACK']='1' does not help.

Can be reproduced with https://huggingface.co/agents-course/notebooks/blob/main/bonus-unit1/bonus-unit1.ipynb on Apple Silicon M4 pro with macOS (latest). Cell #18, before Step 9.

Cf. #77764 (comment)

Just wanted to let you know that this is fixed in pytorch v2.7 (tested on Mac OS 15.4.1 with M4 Pro and python 3.12.10). Thank you.

@ifsheldon
Copy link
Contributor

+1 for aten::linalg_svd. Need it in quantum computing simulations. It's one of the mostly used ops in quantum computing simulations.

@johnlockejrr
Copy link

Pytorch 2.7.0 on Mac M4

NotImplementedError: The operator 'aten::_ctc_loss' is not currently implemented for the MPS device. If you want this op to be considered for addition please comment on https://github.com/pytorch/pytorch/issues/141287 and mention use-case, that resulted in missing op as well as commit hash 134179474539648ba7dee1317959529fbd0e7f89. As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.

@BenjaminDEMAILLE
Copy link

Hi ! Do you have some news about Apple Silicon support ?

@MarcusEddie
Copy link

[rank0]: NotImplementedError: The operator **'c10d::broadcast_' is not currently implemented for the MPS

@MarcusEddie can you create an issue about it. And may be with minimal reproducer? I.e. trying to understand why you need a distributed op on a single GPU system

Hi @malfet
I'm using this script torchrun --nproc_per_node 1 \ -m FlagEmbedding.finetune.embedder.encoder_only.m3 \ --model_name_or_path /Users/ncl/python/FlagNew/models/bge-m3 \
to finetune something in my mac, i tried with nightly torch version(2.8.0.dev20250416), still failed.
The code repo is here: https://github.com/FlagOpen/FlagEmbedding

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A request for a proper, new feature. module: mps Related to Apple Metal Performance Shaders framework tracker A tracking issue triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet