Codestin Search App

jcaip · 2025-04-03T17:39:15Z

This PR is meant to give users the ability to accelerate LLMs with 2:4 activation sparsity, using the approach outlined in our ICLR workshop paper: https://arxiv.org/abs/2503.16672

The main contribution is a cutlass 24_fp8_pack kernel that is able to relatively efficiently calculate the packed data and metadata given a normal dense tensor, which I've copied over from xFormers.

Performance Benchmarks

python benchmarks/benchmark_e2e_fp8_sparse_linear.py 

|   num_tokens |   bf16_latency (us) |   bf16_c_latency (us) |   fp8_c_time (us) |   fp8_c_sparse_time (us) |   fp8_c_activation_sparse_time (us) |   speedup |
|-------------:|--------------------:|----------------------:|------------------:|-------------------------:|------------------------------------:|----------:|
|           64 |             166.816 |               163.04  |           103.008 |                   74.304 |                             102.816 |  1.00187  |
|          128 |             156.256 |               151.52  |            99.936 |                   75.456 |                             102.048 |  0.979304 |
|          256 |             172.288 |               159.584 |           114.08  |                   82.432 |                             111.072 |  1.02708  |
|          512 |             218.88  |               204.608 |           144.096 |                  114.56  |                             139.488 |  1.03304  |
|         1024 |             394.4   |               392.544 |           251.104 |                  196.416 |                             227.904 |  1.1018   |
|         2048 |             764.608 |               734.816 |           480.704 |                  381.152 |                             426.688 |  1.12659  |
|         4096 |            1658.82  |              1623.58  |           901.344 |                  779.008 |                             843.392 |  1.06871  |

Tests

pytest tests/sparsity/test_activation24.py

pytorch-bot · 2025-04-03T17:39:18Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2012

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 126166f with merge base 66eb801 ():

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

Run TorchAO Experimental Tests / test-cpu-ops (macos-14) (gh) (trunk failure)
Process completed with exit code 1.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…into jcaip/actiation24

This PR is meant to give users the ability to accelerate LLMs with 2:4 activation sparsity, using the approach outlined in our ICLR workshop paper: https://arxiv.org/abs/2503.16672 The main contribution is a cutlass 24_fp8_pack kernel that is able to relatively efficiently calculate the packed data and metadata given a normal dense tensor, which I've copied over from xFormers. ### Performance Benchmarks ``` python benchmarks/benchmark_e2e_fp8_sparse_linear.py | num_tokens | bf16_latency (us) | bf16_c_latency (us) | fp8_c_time (us) | fp8_c_sparse_time (us) | fp8_c_activation_sparse_time (us) | speedup | |-------------:|--------------------:|----------------------:|------------------:|-------------------------:|------------------------------------:|----------:| | 64 | 166.816 | 163.04 | 103.008 | 74.304 | 102.816 | 1.00187 | | 128 | 156.256 | 151.52 | 99.936 | 75.456 | 102.048 | 0.979304 | | 256 | 172.288 | 159.584 | 114.08 | 82.432 | 111.072 | 1.02708 | | 512 | 218.88 | 204.608 | 144.096 | 114.56 | 139.488 | 1.03304 | | 1024 | 394.4 | 392.544 | 251.104 | 196.416 | 227.904 | 1.1018 | | 2048 | 764.608 | 734.816 | 480.704 | 381.152 | 426.688 | 1.12659 | | 4096 | 1658.82 | 1623.58 | 901.344 | 779.008 | 843.392 | 1.06871 | ``` ### Tests ``` pytest tests/sparsity/test_activation24.py ```

jcaip added 5 commits March 24, 2025 10:29

wip to get sample op working

f62745f

test

02b65de

wip

cf503aa

update kernel for metadat

5a5ca43

wip

9cbfed0

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 3, 2025

jcaip changed the title ~~Jcaip/actiation24~~ [WIP] 2:4 activation sparsity Apr 3, 2025

jcaip added 21 commits April 14, 2025 13:17

almost working!

268b74d

working but not on random inputs

1a933f9

cleaned up cuda files

f3b67b0

packing is working now!

9f4346d

update

0b62fba

Merge branch 'main' into jcaip/actiation24

f505851

wip

e8a5d5b

updated to not sort 1x16 at a time

b168814

checkpoint

559ca90

removed a lot of templating to try and merge index creation and packing

91f18da

srelu speedups

a5b13a0

wip integrating xformers kernels

6957add

update namespare

0bf05ec

remove extra CUTLASS files

4e91722

more cleanup

65d3c1e

clean up op registration

88bec35

added test for srelu linear

3d4aa93

cleanup

a5b9cab

updated benchmarks + cleaned up prototype folder some more

645576b

added ruff

35263ae

fixed setup

a31dcd5

jcaip mentioned this pull request Apr 22, 2025

[Tracker] TorchAO activation sparsity acceleration 🚀 #2095

Open

9 tasks

jcaip marked this pull request as ready for review April 29, 2025 03:58

jcaip changed the title ~~[WIP] 2:4 activation sparsity~~ 2:4 activation sparsity packing kernels Apr 29, 2025

jcaip added sparsity topic: new feature Use this tag if this PR adds a new feature labels Apr 29, 2025

Merge branch 'main' into jcaip/actiation24

7cdd43a

jerryzh168 approved these changes Apr 29, 2025

View reviewed changes

jcaip and others added 6 commits May 12, 2025 04:59

fix ruff for utils

9ff58a4

ruff fix

f1e9eb1

Merge remote-tracking branch 'refs/remotes/origin/jcaip/actiation24' …

3172f09

…into jcaip/actiation24

Merge branch 'main' into jcaip/actiation24

46b19e8

fix ruff

5b99cd8

ruff format

126166f

jcaip merged commit 9b1256f into main May 12, 2025
33 of 34 checks passed

jcaip deleted the jcaip/actiation24 branch May 21, 2025 22:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2:4 activation sparsity packing kernels#2012

2:4 activation sparsity packing kernels#2012
jcaip merged 33 commits intomainfrom
jcaip/actiation24

jcaip commented Apr 3, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Apr 3, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

jcaip commented Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance Benchmarks

Tests

Uh oh!

pytorch-bot bot commented Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2012

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

jcaip commented Apr 3, 2025 •

edited

Loading

pytorch-bot bot commented Apr 3, 2025 •

edited

Loading