Codestin Search App

gderossi · 2026-02-11T13:53:09Z

Both cuSolver and hipSolver implement potrs, so this just removes the MAGMA path entirely and adds a deprecation warning, and updates the tests to skip only if missing cuSolver. Thanks @nikitaved for the performance improvements in #175898! Note that cholesky_inverse depends on cholesky_solve functions and so #174681 should be merged before this.

Benchmarking script:

import torch
import torch.utils.benchmark as benchmark

from itertools import product

results = []

batches = [(), (16,), (64,)]
sizes = [16, 128, 512,  2048]

for b, n in product(batches, sizes):
    shape = b + (n, n)
    print(f"Testing shape={shape}")
    label = "torch.cholesky_solve"
    sub_label = f"{shape}"
    A = torch.rand(*shape, device="cuda")
    A = A @ A.mT + torch.eye(n, device="cuda")
    B = torch.rand(*shape, device="cuda")
    L = torch.linalg.cholesky(A)
    stmt = "torch.cholesky_solve(B, L)"
    for backend in ("magma", "cusolver"):
        torch.backends.cuda.preferred_linalg_library(backend)
        # warm-up
        for _ in range(5):
            exec(stmt)

        results.append(benchmark.Timer(
            stmt=stmt,
            globals={'L': L, 'B': B},
            label=label,
            sub_label=sub_label,
            description=backend,
        ).blocked_autorange(min_run_time=1))

compare = benchmark.Compare(results)
compare.print()

Benchmarking results on RTX Pro 6000:

[------------ torch.cholesky_solve -----------]
                        |   magma   |  cusolver | speedup
1 threads: ------------------------------------ |
      (16, 16)          |   1579.0  |     21.3  | 74.1
      (128, 128)        |   1593.5  |     50.1  | 31.8
      (512, 512)        |   1731.0  |    226.3  | 7.6
      (2048, 2048)      |   2956.9  |   1587.6  | 1.9
      (16, 16, 16)      |     28.7  |     28.3  | 1.0
      (16, 128, 128)    |    601.8  |    121.8  | 4.9
      (16, 512, 512)    |   3246.6  |    735.5  | 4.4
      (16, 2048, 2048)  |  23108.0  |  11620.9  | 2.0
      (64, 16, 16)      |     29.5  |     28.2  | 1.0
      (64, 128, 128)    |    644.3  |    140.5  | 4.6
      (64, 512, 512)    |   4571.2  |   1840.4  | 2.5
      (64, 2048, 2048)  |  65104.7  |  42007.7  | 1.5

Times are in microseconds (us).

cc @nikitaved @eqy

pytorch-bot · 2026-02-11T13:53:13Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/174769

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 409a19f with merge base 596dbc5 ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

trunk / linux-jammy-rocm-py3.10 / test (distributed, 1, 3, linux.rocm.gpu.gfx950.4) (gh) (similar failure)
test/distributed/_composable/test_replicate_with_fsdp.py::ReplicateTest::test_train_parity_2d_mlp

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@gderossi

…batched inputs (#175898) cuSOLVER status: `potrsBatched` only supports `nrsh=1`, while looped `potrs` is slower compared to two batched calls to `solve_triangular`. This PR should should unblock #174769, since it makes cuSOLVER backend faster than MAGMA (since cuSOLVER triangular solve is faster). Benchmarks: ``` [------ torch.cholesky_solve -----] | new | old 1 threads: --------------------------- (16, 16, 16) | 29.7 | 157.6 (16, 128, 128) | 120.8 | 713.2 (16, 512, 512) | 730.9 | 3718.6 (16, 2048, 2048) | 11594.7 | 25689.6 (64, 16, 16) | 29.7 | 608.8 (64, 128, 128) | 139.3 | 2831.9 (64, 512, 512) | 1828.7 | 15173.4 (64, 2048, 2048) | 41961.1 | 102721.2 Times are in microseconds (us). ``` Thanks to @gderossi for the benchmarks comparing cuSOLVER kernels vs MAGMA, and his analysis of cuSOLVER's limitations. All this ultimately led to creating this PR. Pull Request resolved: #175898 Approved by: https://github.com/eqy, https://github.com/Aidyn-A

nikitaved

👍

eqy · 2026-03-03T16:25:17Z

@pytorchmergebot merge

eqy

looks like rocm build is broken?

pytorchmergebot · 2026-03-03T16:40:06Z

Merge failed

Reason: Approvers from one of the following sets are needed:

Linear Algebra (mruberry, lezcano, IvanYashchuk)
superuser (pytorch/metamates)
Core Reviewers (mruberry, lezcano, Skylion007, ngimel, peterbell10, ...)
Core Maintainers (soumith, gchanan, ezyang, malfet, albanD, ...)

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

eqy · 2026-03-03T16:42:28Z

@pytorchmergebot merge

pytorchmergebot · 2026-03-03T16:44:51Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2026-03-03T16:45:04Z

Merge failed

Reason: 2 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

Aidyn-A · 2026-03-04T06:07:28Z

The build failure is real. Here it is attempting to call apply_cholesky_solve you have removed.

Okay, it makes sense, as the PR #174681 should land first.

Yep, that failure is not a surprise. #174681 has now landed so hopefully everything should work.

eqy · 2026-03-04T18:33:31Z

@pytorchmergebot label ciflow/trunk ciflow/rocm-mi300

gderossi · 2026-03-04T21:29:57Z

@pytorchmergebot merge

pytorchmergebot · 2026-03-04T21:43:25Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

@gderossi

…batched inputs (pytorch#175898) cuSOLVER status: `potrsBatched` only supports `nrsh=1`, while looped `potrs` is slower compared to two batched calls to `solve_triangular`. This PR should should unblock pytorch#174769, since it makes cuSOLVER backend faster than MAGMA (since cuSOLVER triangular solve is faster). Benchmarks: ``` [------ torch.cholesky_solve -----] | new | old 1 threads: --------------------------- (16, 16, 16) | 29.7 | 157.6 (16, 128, 128) | 120.8 | 713.2 (16, 512, 512) | 730.9 | 3718.6 (16, 2048, 2048) | 11594.7 | 25689.6 (64, 16, 16) | 29.7 | 608.8 (64, 128, 128) | 139.3 | 2831.9 (64, 512, 512) | 1828.7 | 15173.4 (64, 2048, 2048) | 41961.1 | 102721.2 Times are in microseconds (us). ``` Thanks to @gderossi for the benchmarks comparing cuSOLVER kernels vs MAGMA, and his analysis of cuSOLVER's limitations. All this ultimately led to creating this PR. Pull Request resolved: pytorch#175898 Approved by: https://github.com/eqy, https://github.com/Aidyn-A

@nikitaved

…r unconditionally (pytorch#174769) Both cuSolver and hipSolver implement potrs, so this just removes the MAGMA path entirely and adds a deprecation warning, and updates the tests to skip only if missing cuSolver. Thanks @nikitaved for the performance improvements in pytorch#175898! Note that cholesky_inverse depends on cholesky_solve functions and so pytorch#174681 should be merged before this. Benchmarking script: ```python import torch import torch.utils.benchmark as benchmark from itertools import product results = [] batches = [(), (16,), (64,)] sizes = [16, 128, 512, 2048] for b, n in product(batches, sizes): shape = b + (n, n) print(f"Testing shape={shape}") label = "torch.cholesky_solve" sub_label = f"{shape}" A = torch.rand(*shape, device="cuda") A = A @ A.mT + torch.eye(n, device="cuda") B = torch.rand(*shape, device="cuda") L = torch.linalg.cholesky(A) stmt = "torch.cholesky_solve(B, L)" for backend in ("magma", "cusolver"): torch.backends.cuda.preferred_linalg_library(backend) # warm-up for _ in range(5): exec(stmt) results.append(benchmark.Timer( stmt=stmt, globals={'L': L, 'B': B}, label=label, sub_label=sub_label, description=backend, ).blocked_autorange(min_run_time=1)) compare = benchmark.Compare(results) compare.print() ``` Benchmarking results on RTX Pro 6000: ``` [------------ torch.cholesky_solve -----------] | magma | cusolver | speedup 1 threads: ------------------------------------ | (16, 16) | 1579.0 | 21.3 | 74.1 (128, 128) | 1593.5 | 50.1 | 31.8 (512, 512) | 1731.0 | 226.3 | 7.6 (2048, 2048) | 2956.9 | 1587.6 | 1.9 (16, 16, 16) | 28.7 | 28.3 | 1.0 (16, 128, 128) | 601.8 | 121.8 | 4.9 (16, 512, 512) | 3246.6 | 735.5 | 4.4 (16, 2048, 2048) | 23108.0 | 11620.9 | 2.0 (64, 16, 16) | 29.5 | 28.2 | 1.0 (64, 128, 128) | 644.3 | 140.5 | 4.6 (64, 512, 512) | 4571.2 | 1840.4 | 2.5 (64, 2048, 2048) | 65104.7 | 42007.7 | 1.5 Times are in microseconds (us). ``` Pull Request resolved: pytorch#174769 Approved by: https://github.com/nikitaved, https://github.com/eqy

pytorch-bot Bot added the release notes: linalg_frontend release notes category label Feb 11, 2026

pytorchbot added the open source label Feb 11, 2026

seemethere mentioned this pull request Feb 15, 2026

Consolidate or retire pytorch/almalinux-builder Docker images after MAGMA deprecation #175045

Open

nikitaved mentioned this pull request Feb 26, 2026

[CUDA][cuSOLVER] torch.cholesky_solve - performance improvements for batched inputs #175898

Closed

Remove MAGMA path from cholesky_solve

8ce4135

gderossi force-pushed the deprecate-magma-cholesky-solve branch from 7b14777 to 8ce4135 Compare March 2, 2026 22:31

gderossi marked this pull request as ready for review March 2, 2026 22:33

gderossi requested review from Aidyn-A, IvanYashchuk, eqy, lezcano, nikitaved and syed-ahmed as code owners March 2, 2026 22:33

nikitaved approved these changes Mar 3, 2026

View reviewed changes

nikitaved added ciflow/trunk Trigger trunk jobs on your pull request ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 labels Mar 3, 2026

eqy reviewed Mar 3, 2026

View reviewed changes

pytorchmergebot added the merging label Mar 3, 2026

pytorchmergebot removed the merging label Mar 3, 2026

eqy approved these changes Mar 3, 2026

View reviewed changes

pytorchmergebot added the merging label Mar 3, 2026

pytorchmergebot removed the merging label Mar 3, 2026

Aidyn-A reviewed Mar 4, 2026

View reviewed changes

Merge branch 'main' into deprecate-magma-cholesky-solve

069d700

pytorch-bot Bot removed ciflow/trunk Trigger trunk jobs on your pull request ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 labels Mar 4, 2026

Aidyn-A added ciflow/trunk Trigger trunk jobs on your pull request ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 labels Mar 4, 2026

Remove use_magma_ flag

409a19f

pytorch-bot Bot removed ciflow/trunk Trigger trunk jobs on your pull request ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 labels Mar 4, 2026

pytorch-bot Bot added ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 ciflow/trunk Trigger trunk jobs on your pull request labels Mar 4, 2026

pytorchmergebot added the merging label Mar 4, 2026

pytorchmergebot added the Merged label Mar 4, 2026

pytorchmergebot closed this in f2f3977 Mar 4, 2026

pytorchmergebot removed the merging label Mar 4, 2026

gderossi deleted the deprecate-magma-cholesky-solve branch March 9, 2026 14:25

Conversation

gderossi commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/174769

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

nikitaved left a comment

Choose a reason for hiding this comment

Uh oh!

eqy commented Mar 3, 2026

Uh oh!

eqy left a comment

Choose a reason for hiding this comment

Uh oh!

pytorchmergebot commented Mar 3, 2026

Merge failed

Uh oh!

eqy commented Mar 3, 2026

Uh oh!

pytorchmergebot commented Mar 3, 2026

Merge started

Uh oh!

pytorchmergebot commented Mar 3, 2026

Merge failed

Uh oh!

Aidyn-A Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Aidyn-A Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

gderossi Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

eqy commented Mar 4, 2026

Uh oh!

gderossi commented Mar 4, 2026

Uh oh!

pytorchmergebot commented Mar 4, 2026

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

gderossi commented Feb 11, 2026 •

edited

Loading

pytorch-bot Bot commented Feb 11, 2026 •

edited

Loading