Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[MAGMA][CUDA] cholesky_solve: deprecate MAGMA and dispatch to cuSolver unconditionally#174769

Closed
gderossi wants to merge 3 commits into
pytorch:mainfrom
gderossi:deprecate-magma-cholesky-solve
Closed

[MAGMA][CUDA] cholesky_solve: deprecate MAGMA and dispatch to cuSolver unconditionally#174769
gderossi wants to merge 3 commits into
pytorch:mainfrom
gderossi:deprecate-magma-cholesky-solve

Conversation

@gderossi
Copy link
Copy Markdown
Contributor

@gderossi gderossi commented Feb 11, 2026

Both cuSolver and hipSolver implement potrs, so this just removes the MAGMA path entirely and adds a deprecation warning, and updates the tests to skip only if missing cuSolver. Thanks @nikitaved for the performance improvements in #175898! Note that cholesky_inverse depends on cholesky_solve functions and so #174681 should be merged before this.

Benchmarking script:

import torch
import torch.utils.benchmark as benchmark

from itertools import product

results = []

batches = [(), (16,), (64,)]
sizes = [16, 128, 512,  2048]

for b, n in product(batches, sizes):
    shape = b + (n, n)
    print(f"Testing shape={shape}")
    label = "torch.cholesky_solve"
    sub_label = f"{shape}"
    A = torch.rand(*shape, device="cuda")
    A = A @ A.mT + torch.eye(n, device="cuda")
    B = torch.rand(*shape, device="cuda")
    L = torch.linalg.cholesky(A)
    stmt = "torch.cholesky_solve(B, L)"
    for backend in ("magma", "cusolver"):
        torch.backends.cuda.preferred_linalg_library(backend)
        # warm-up
        for _ in range(5):
            exec(stmt)

        results.append(benchmark.Timer(
            stmt=stmt,
            globals={'L': L, 'B': B},
            label=label,
            sub_label=sub_label,
            description=backend,
        ).blocked_autorange(min_run_time=1))

compare = benchmark.Compare(results)
compare.print()

Benchmarking results on RTX Pro 6000:

[------------ torch.cholesky_solve -----------]
                        |   magma   |  cusolver | speedup
1 threads: ------------------------------------ |
      (16, 16)          |   1579.0  |     21.3  | 74.1
      (128, 128)        |   1593.5  |     50.1  | 31.8
      (512, 512)        |   1731.0  |    226.3  | 7.6
      (2048, 2048)      |   2956.9  |   1587.6  | 1.9
      (16, 16, 16)      |     28.7  |     28.3  | 1.0
      (16, 128, 128)    |    601.8  |    121.8  | 4.9
      (16, 512, 512)    |   3246.6  |    735.5  | 4.4
      (16, 2048, 2048)  |  23108.0  |  11620.9  | 2.0
      (64, 16, 16)      |     29.5  |     28.2  | 1.0
      (64, 128, 128)    |    644.3  |    140.5  | 4.6
      (64, 512, 512)    |   4571.2  |   1840.4  | 2.5
      (64, 2048, 2048)  |  65104.7  |  42007.7  | 1.5

Times are in microseconds (us).

cc @nikitaved @eqy

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Feb 11, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/174769

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 409a19f with merge base 596dbc5 (image):

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot Bot added the release notes: linalg_frontend release notes category label Feb 11, 2026
pytorchmergebot pushed a commit that referenced this pull request Mar 2, 2026
…batched inputs (#175898)

cuSOLVER status: `potrsBatched` only supports `nrsh=1`, while looped `potrs` is slower compared to two batched calls to `solve_triangular`.

This PR should should unblock #174769, since it makes cuSOLVER backend faster than MAGMA (since cuSOLVER triangular solve is faster).

Benchmarks:
```
[------ torch.cholesky_solve -----]
                        |  new    | old
1 threads: ---------------------------
      (16, 16, 16)      |     29.7   |   157.6
      (16, 128, 128)    |    120.8   |   713.2
      (16, 512, 512)    |    730.9   |   3718.6
      (16, 2048, 2048)  |  11594.7   |   25689.6
      (64, 16, 16)      |     29.7   |   608.8
      (64, 128, 128)    |    139.3   |   2831.9
      (64, 512, 512)    |   1828.7   |   15173.4
      (64, 2048, 2048)  |  41961.1   |   102721.2

Times are in microseconds (us).
```

Thanks to @gderossi for the benchmarks comparing cuSOLVER kernels vs MAGMA, and his analysis of cuSOLVER's limitations. All this ultimately led to creating this PR.

Pull Request resolved: #175898
Approved by: https://github.com/eqy, https://github.com/Aidyn-A
@gderossi gderossi force-pushed the deprecate-magma-cholesky-solve branch from 7b14777 to 8ce4135 Compare March 2, 2026 22:31
@gderossi gderossi marked this pull request as ready for review March 2, 2026 22:33
Copy link
Copy Markdown
Collaborator

@nikitaved nikitaved left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@nikitaved nikitaved added ciflow/trunk Trigger trunk jobs on your pull request ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 labels Mar 3, 2026
@eqy
Copy link
Copy Markdown
Collaborator

eqy commented Mar 3, 2026

@pytorchmergebot merge

Copy link
Copy Markdown
Collaborator

@eqy eqy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like rocm build is broken?

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge failed

Reason: Approvers from one of the following sets are needed:

  • Linear Algebra (mruberry, lezcano, IvanYashchuk)
  • superuser (pytorch/metamates)
  • Core Reviewers (mruberry, lezcano, Skylion007, ngimel, peterbell10, ...)
  • Core Maintainers (soumith, gchanan, ezyang, malfet, albanD, ...)
Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

@eqy
Copy link
Copy Markdown
Collaborator

eqy commented Mar 3, 2026

@pytorchmergebot merge

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge failed

Reason: 2 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The build failure is real. Here it is attempting to call apply_cholesky_solve you have removed.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, it makes sense, as the PR #174681 should land first.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that failure is not a surprise. #174681 has now landed so hopefully everything should work.

@pytorch-bot pytorch-bot Bot removed ciflow/trunk Trigger trunk jobs on your pull request ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 labels Mar 4, 2026
@Aidyn-A Aidyn-A added ciflow/trunk Trigger trunk jobs on your pull request ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 labels Mar 4, 2026
@pytorch-bot pytorch-bot Bot removed ciflow/trunk Trigger trunk jobs on your pull request ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 labels Mar 4, 2026
@eqy
Copy link
Copy Markdown
Collaborator

eqy commented Mar 4, 2026

@pytorchmergebot label ciflow/trunk ciflow/rocm-mi300

@pytorch-bot pytorch-bot Bot added ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 ciflow/trunk Trigger trunk jobs on your pull request labels Mar 4, 2026
@gderossi
Copy link
Copy Markdown
Contributor Author

gderossi commented Mar 4, 2026

@pytorchmergebot merge

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@gderossi gderossi deleted the deprecate-magma-cholesky-solve branch March 9, 2026 14:25
EmanueleCoradin pushed a commit to EmanueleCoradin/pytorch that referenced this pull request Mar 30, 2026
…batched inputs (pytorch#175898)

cuSOLVER status: `potrsBatched` only supports `nrsh=1`, while looped `potrs` is slower compared to two batched calls to `solve_triangular`.

This PR should should unblock pytorch#174769, since it makes cuSOLVER backend faster than MAGMA (since cuSOLVER triangular solve is faster).

Benchmarks:
```
[------ torch.cholesky_solve -----]
                        |  new    | old
1 threads: ---------------------------
      (16, 16, 16)      |     29.7   |   157.6
      (16, 128, 128)    |    120.8   |   713.2
      (16, 512, 512)    |    730.9   |   3718.6
      (16, 2048, 2048)  |  11594.7   |   25689.6
      (64, 16, 16)      |     29.7   |   608.8
      (64, 128, 128)    |    139.3   |   2831.9
      (64, 512, 512)    |   1828.7   |   15173.4
      (64, 2048, 2048)  |  41961.1   |   102721.2

Times are in microseconds (us).
```

Thanks to @gderossi for the benchmarks comparing cuSOLVER kernels vs MAGMA, and his analysis of cuSOLVER's limitations. All this ultimately led to creating this PR.

Pull Request resolved: pytorch#175898
Approved by: https://github.com/eqy, https://github.com/Aidyn-A
EmanueleCoradin pushed a commit to EmanueleCoradin/pytorch that referenced this pull request Mar 30, 2026
…r unconditionally (pytorch#174769)

Both cuSolver and hipSolver implement potrs, so this just removes the MAGMA path entirely and adds a deprecation warning, and updates the tests to skip only if missing cuSolver. Thanks @nikitaved for the performance improvements in pytorch#175898! Note that cholesky_inverse depends on cholesky_solve functions and so pytorch#174681 should be merged before this.

Benchmarking script:
```python
import torch
import torch.utils.benchmark as benchmark

from itertools import product

results = []

batches = [(), (16,), (64,)]
sizes = [16, 128, 512,  2048]

for b, n in product(batches, sizes):
    shape = b + (n, n)
    print(f"Testing shape={shape}")
    label = "torch.cholesky_solve"
    sub_label = f"{shape}"
    A = torch.rand(*shape, device="cuda")
    A = A @ A.mT + torch.eye(n, device="cuda")
    B = torch.rand(*shape, device="cuda")
    L = torch.linalg.cholesky(A)
    stmt = "torch.cholesky_solve(B, L)"
    for backend in ("magma", "cusolver"):
        torch.backends.cuda.preferred_linalg_library(backend)
        # warm-up
        for _ in range(5):
            exec(stmt)

        results.append(benchmark.Timer(
            stmt=stmt,
            globals={'L': L, 'B': B},
            label=label,
            sub_label=sub_label,
            description=backend,
        ).blocked_autorange(min_run_time=1))

compare = benchmark.Compare(results)
compare.print()
```

Benchmarking results on RTX Pro 6000:
```
[------------ torch.cholesky_solve -----------]
                        |   magma   |  cusolver | speedup
1 threads: ------------------------------------ |
      (16, 16)          |   1579.0  |     21.3  | 74.1
      (128, 128)        |   1593.5  |     50.1  | 31.8
      (512, 512)        |   1731.0  |    226.3  | 7.6
      (2048, 2048)      |   2956.9  |   1587.6  | 1.9
      (16, 16, 16)      |     28.7  |     28.3  | 1.0
      (16, 128, 128)    |    601.8  |    121.8  | 4.9
      (16, 512, 512)    |   3246.6  |    735.5  | 4.4
      (16, 2048, 2048)  |  23108.0  |  11620.9  | 2.0
      (64, 16, 16)      |     29.5  |     28.2  | 1.0
      (64, 128, 128)    |    644.3  |    140.5  | 4.6
      (64, 512, 512)    |   4571.2  |   1840.4  | 2.5
      (64, 2048, 2048)  |  65104.7  |  42007.7  | 1.5

Times are in microseconds (us).
```

Pull Request resolved: pytorch#174769
Approved by: https://github.com/nikitaved, https://github.com/eqy
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 ciflow/trunk Trigger trunk jobs on your pull request Merged open source release notes: linalg_frontend release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants