Move mps_linear forward to use MPS kernels directly instead of MPSGraph #152210

jhavukainen · 2025-04-25T20:40:18Z

This PR moves mps_linear to use MPSNDArrays and call into the MPS kernel directly instead of going through MPSGraph. It also adds a caching mechanism for reusing MPS kernels as there is also a small overhead attached to creating the kernel object.

The impact of the improvement is relatively more significant for small input kernels where the MPSGraph overhead represents a larger portion of the overall execution time of the operation but the speedup shows for both small and large input sizes as expected.

mps_linear before the changes:

input shapes: f32:[1,1,20], f32:[1,20]
torch.linear time: <torch.utils.benchmark.utils.common.Measurement object at 0x109d67110>
func(*args, **kwargs)
  Median: 199.29 us
  IQR:    9.56 us (196.71 to 206.27)
  979 measurements, 1 runs per measurement, 1 thread

input shapes: f32:[1,1,5120], f32:[13284,5120]
torch.linear time: <torch.utils.benchmark.utils.common.Measurement object at 0x1063b4510>
func(*args, **kwargs)
  Median: 979.29 us
  IQR:    25.29 us (964.83 to 990.13)
  205 measurements, 1 runs per measurement, 1 thread

mps_linear after the changes:

input shapes: f32:[1,1,20], f32:[1,20]
torch.linear time: <torch.utils.benchmark.utils.common.Measurement object at 0x10693a190>
func(*args, **kwargs)
  Median: 176.08 us
  IQR:    15.02 us (172.42 to 187.44)
  1103 measurements, 1 runs per measurement, 1 thread

input shapes: f32:[1,1,5120], f32:[13284,5120]
torch.linear time: <torch.utils.benchmark.utils.common.Measurement object at 0x10d524dd0>
func(*args, **kwargs)
  Median: 952.56 us
  IQR:    15.63 us (945.47 to 961.10)
  210 measurements, 1 runs per measurement, 1 thread

cc @kulinseth @albanD @malfet @DenisVieriu97

pytorch-bot · 2025-04-25T20:40:22Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/152210

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 20 Pending, 1 Unrelated Failure

As of commit 8727017 with merge base 1d3e8f3 ():

NEW FAILURE - The following job has failed:

pull / linux-focal-py3_9-clang9-xla / build (gh)
ninja: build stopped: subcommand failed

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

pull / linux-jammy-py3-clang12-executorch / test (executorch, 1, 1, ephemeral.linux.2xlarge) (gh) (#144480)
Process completed with exit code 1.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

aten/src/ATen/native/mps/OperationUtils.h

kulinseth

Looks good.

jhavukainen · 2025-05-08T15:51:55Z

@pytorchbot rebase

pytorchmergebot · 2025-05-08T15:53:34Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2025-05-08T15:53:36Z

Rebase failed due to Command git -C /home/runner/work/pytorch/pytorch rebase refs/remotes/origin/viable/strict pull/152210/head returned non-zero exit code 1

Rebasing (1/5)
hint: Recursive merging with submodules currently only supports trivial cases.
hint: Please manually handle the merging of each conflicted submodule.
hint: This can be accomplished with the following steps:
hint:  - come back to superproject and run:
hint:
hint:       git add third_party/cutlass
hint:
hint:    to record the above merge or update
hint:  - resolve any other conflicts in the superproject
hint:  - commit the resulting index in the superproject
hint:
hint: Disable this message with "git config set advice.submoduleMergeConflict false"
Failed to merge submodule third_party/cutlass (not checked out)
CONFLICT (submodule): Merge conflict in third_party/cutlass
error: could not apply 657bf1e0646... Adding a direct MPS kernel path to linear op and MPS kernel caching mechanism for improved perf.
hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Could not apply 657bf1e0646... Adding a direct MPS kernel path to linear op and MPS kernel caching mechanism for improved perf.

Raised by https://github.com/pytorch/pytorch/actions/runs/14910649119

…echanism for improved perf.

MPSSequoiaOps header to Linear. Add check for macos version to use correct code path to avoid breaking previous OS.

malfet · 2025-05-09T21:57:16Z

aten/src/ATen/native/mps/operations/Linear.mm

+  bool is_macos_15_or_newer = is_macos_13_or_newer(MacOSVersion::MACOS_VER_15_0_PLUS);
+  if (is_macos_15_or_newer) {
+    _mps_linear_nograph(input, weight, bias, output);
+  } else {


IMO it would be nice to just replace it with if() { return;} (Doing it now)

malfet

LGTM, though feels a bit like too much boilerplate code, but will look into it later

malfet · 2025-05-09T23:39:36Z

@pytorchbot merge -f "Lint + MPS are green"

pytorchmergebot · 2025-05-09T23:41:13Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

hvaara · 2025-06-05T20:10:08Z

Looks like this fixed #132332 🎉

jhavukainen added the module: mps Related to Apple Metal Performance Shaders framework label Apr 25, 2025

jhavukainen requested review from kulinseth and malfet as code owners April 25, 2025 20:40

pytorch-bot bot added ciflow/mps Run MPS tests (subset of trunk) release notes: mps Release notes category labels Apr 25, 2025

pytorchbot added the open source label Apr 25, 2025

malfet reviewed Apr 25, 2025

View reviewed changes

aten/src/ATen/native/mps/OperationUtils.h Show resolved Hide resolved

soulitzer added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Apr 28, 2025

jhavukainen force-pushed the dev/joona/mps_linear branch from 49b4351 to 76a1b48 Compare May 2, 2025 19:46

jhavukainen requested a review from malfet May 6, 2025 21:17

jhavukainen force-pushed the dev/joona/mps_linear branch from acaeb1d to 06a28bb Compare May 8, 2025 00:23

kulinseth approved these changes May 8, 2025

View reviewed changes

jhavukainen added 4 commits May 8, 2025 08:57

Adding a direct MPS kernel path to linear op and MPS kernel caching m…

df2965b

…echanism for improved perf.

Delete copy constructor and assignment from MPSCachedKernel. Include

33b4b7c

MPSSequoiaOps header to Linear. Add check for macos version to use correct code path to avoid breaking previous OS.

Use the proper way of addressing strides when creating MPSNDArray

239ae65

Get encoder after call to endKernelCoalescing

f855f95

jhavukainen force-pushed the dev/joona/mps_linear branch from 06a28bb to f855f95 Compare May 8, 2025 15:58

malfet reviewed May 9, 2025

View reviewed changes

malfet added 2 commits May 9, 2025 15:13

Return early

e7d2aca

Fix lint

931b73e

malfet approved these changes May 9, 2025

View reviewed changes

Fix typo

8727017

pytorchmergebot added the merging label May 9, 2025

pytorchmergebot closed this in 4e24ee7 May 9, 2025

pytorchmergebot added Merged and removed merging labels May 9, 2025

hvaara mentioned this pull request Jun 5, 2025

[MPS] Memory leak in nn.Linear #132332

Closed

hvaara mentioned this pull request Jun 5, 2025

MaxPool2D memory leakage on device MPS #125217

Open

malfet mentioned this pull request Jun 14, 2025

[MPS] Performance regression and visual bug with ComfyUI Flux dev since nightly 20250510 #155797

Closed

github-actions bot deleted the dev/joona/mps_linear branch July 6, 2025 02:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Move mps_linear forward to use MPS kernels directly instead of MPSGraph #152210

Move mps_linear forward to use MPS kernels directly instead of MPSGraph #152210

Uh oh!

jhavukainen commented Apr 25, 2025 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented Apr 25, 2025 •

edited

Loading

Uh oh!

Uh oh!

kulinseth left a comment

Uh oh!

jhavukainen commented May 8, 2025

Uh oh!

pytorchmergebot commented May 8, 2025

Uh oh!

pytorchmergebot commented May 8, 2025

Uh oh!

malfet May 9, 2025

Uh oh!

malfet left a comment

Uh oh!

malfet commented May 9, 2025

Uh oh!

pytorchmergebot commented May 9, 2025

Uh oh!

hvaara commented Jun 5, 2025

Uh oh!

Uh oh!

Move mps_linear forward to use MPS kernels directly instead of MPSGraph #152210

Move mps_linear forward to use MPS kernels directly instead of MPSGraph #152210

Uh oh!

Conversation

jhavukainen commented Apr 25, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Apr 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/152210

❌ 1 New Failure, 20 Pending, 1 Unrelated Failure

Uh oh!

Uh oh!

kulinseth left a comment

Choose a reason for hiding this comment

Uh oh!

jhavukainen commented May 8, 2025

Uh oh!

pytorchmergebot commented May 8, 2025

Uh oh!

pytorchmergebot commented May 8, 2025

Uh oh!

malfet May 9, 2025

Choose a reason for hiding this comment

Uh oh!

malfet left a comment

Choose a reason for hiding this comment

Uh oh!

malfet commented May 9, 2025

Uh oh!

pytorchmergebot commented May 9, 2025

Merge started

Uh oh!

hvaara commented Jun 5, 2025

Uh oh!

Uh oh!

jhavukainen commented Apr 25, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Apr 25, 2025 •

edited

Loading