Add `softmax_csr` implementation #264

DamianSzwichtenberg · 2023-10-12T11:31:32Z

This PR adds forward and backward implementation of sparse softmax operation as defined here.

In the pytorch_geometric implementation we cannot take advantage of model compilation when groups are defined via ptr. softmax_csr introduced here provides a well-performing kernel for such a scenario.

Performance boost (achieved on 28C, single socket machine):
~7x for forward pass
~8x for backward pass
Additionally, GAT training time was reduced by ~5%.

codecov · 2023-10-12T11:41:38Z

Codecov Report

Attention: 4 lines in your changes are missing coverage. Please review.

Comparison is base (2b9af1c) 85.65% compared to head (40c8f52) 86.19%.

Files	Patch %	Lines
pyg_lib/csrc/ops/cpu/softmax_kernel.cpp	91.11%	4 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #264      +/-   ##
==========================================
+ Coverage   85.65%   86.19%   +0.54%     
==========================================
  Files          32       34       +2     
  Lines        1115     1188      +73     
==========================================
+ Hits          955     1024      +69     
- Misses        160      164       +4

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

DamianSzwichtenberg · 2023-10-12T11:45:23Z

@pyg-team/intel-team Please take a look.

kgajdamo · 2023-10-13T07:15:08Z

pyg_lib/ops/__init__.py

+                [0.0598, 0.2923, 0.1206, 0.0921],
+                [0.7792, 0.3502, 0.1638, 0.2145]])
+    """
+    if src.dim() != 2 or not src.is_cpu or ptr is None or dim != 0:


I wonder why ptr is optional because if you don't provide it, you get an error.

I wanted to make API in its final form, otherwise, each change here would require a change in pytorch_geometric. I'll add support for index in the near future.

@kgajdamo, after rethinking your suggestion, I decided to change the API and create a specialized softmax_csr operation that accepts ptr only. Rationale:
torch.compile gives nice results for softmax with groups defined via index, hence I don't see a reason to have a specialized kernel for that option.

DamianSzwichtenberg · 2023-11-03T13:11:56Z

@kgajdamo @rusty1s Please take a look. I made the softmax implementation a bit more general, so now it covers any src dimensionality as well as any given dim. I also restricted groups to be defined via ptr (rationale here).

yanbing-j · 2023-11-07T01:42:46Z

Hi @DamianSzwichtenberg , this PR looks good to me. The overall structure of softmax kernel with sparse input is similar with that in softmax kernel of dense input in PyTorch. With the sparsity, the performance boost is from parallelism, right?

And will this PR upstream to PyTorch later? Since there is no SparseCsr support for softmax yet in PyTorch.

DamianSzwichtenberg · 2023-11-07T07:33:27Z

Hi @DamianSzwichtenberg , this PR looks good to me. The overall structure of softmax kernel with sparse input is similar with that in softmax kernel of dense input in PyTorch. With the sparsity, the performance boost is from parallelism, right?

These kernels differ quite a bit. In softmax_csr groups are created from ptr across given dim and then, for each dimension different from dim we also create seperate group. If you take a look at tests, to achieve the same result with torch.nn.Softmax you need to slice the tensor. As for performance gains, parallelization is made through the groups defined by user (which also differs from Softmax). The other thing is that, for most common scenario (dim=0), we access data in contiguous manner, despite the fact that contiguous elements do not belong to the same group. So performance comes from parallelization and good memory access pattern.

And will this PR upstream to PyTorch later? Since there is no SparseCsr support for softmax yet in PyTorch.

There are no plans to upstream this operation to PyTorch. As above, softmax_csr differs from Softmax operation defined in torch.

pyg_lib/csrc/ops/cpu/softmax_kernel.cpp

kgajdamo

Looks good to me.

for more information, see https://pre-commit.ci

rusty1s · 2023-11-17T14:13:33Z

pyg_lib/ops/__init__.py

+class Softmax(torch.autograd.Function):
+    @staticmethod
+    def forward(


Can we define the autograd function directly in C++?

Should be possible, will check.

Change available at #282

This PR uses optimized `softmax_csr` operation (introduced in [pyg-lib @ 264](pyg-team/pyg-lib#264)), when given is a CPU tensor, and softmax groups are defined via `ptr`.

DamianSzwichtenberg added feature ops labels Oct 12, 2023

DamianSzwichtenberg requested review from a team and rusty1s October 12, 2023 11:31

DamianSzwichtenberg self-assigned this Oct 12, 2023

DamianSzwichtenberg force-pushed the sparse-softmax-optimized branch from 2b91285 to c56a93b Compare October 12, 2023 11:33

kgajdamo reviewed Oct 13, 2023

View reviewed changes

DamianSzwichtenberg force-pushed the sparse-softmax-optimized branch from 5dad205 to 167793a Compare November 3, 2023 12:55

DamianSzwichtenberg changed the title ~~Add sparse softmax implementation~~ Add softmax_csr implementation Nov 3, 2023

DamianSzwichtenberg force-pushed the sparse-softmax-optimized branch from 167793a to 6703779 Compare November 3, 2023 13:05

DamianSzwichtenberg requested a review from yanbing-j November 6, 2023 12:18

kgajdamo reviewed Nov 16, 2023

View reviewed changes

pyg_lib/csrc/ops/cpu/softmax_kernel.cpp Show resolved Hide resolved

kgajdamo approved these changes Nov 16, 2023

View reviewed changes

DamianSzwichtenberg and others added 12 commits November 17, 2023 06:58

Add softmax forward implementation

b10bbce

Add softmax backward implementation

e377dbb

Add autograd support

deee41f

Improve backward pass

e314327

Improve var naming

6891b30

Add softmax benchmarks

a7c2114

Add tests for softmax

4e5a7a8

Improve docstring

caba7bb

Update CHANGELOG.md

25dbf98

[pre-commit.ci] auto fixes from pre-commit.com hooks

b53570d

for more information, see https://pre-commit.ci

Make softmax more general

df1e430

Add cpp tests

cb4bb68

DamianSzwichtenberg force-pushed the sparse-softmax-optimized branch from 68a2723 to cb4bb68 Compare November 17, 2023 05:59

Handle negative dim

40c8f52

DamianSzwichtenberg merged commit 0e787f1 into pyg-team:master Nov 17, 2023

DamianSzwichtenberg mentioned this pull request Nov 17, 2023

Use optimized implementation in softmax operation pyg-team/pytorch_geometric#8399

Merged

rusty1s reviewed Nov 17, 2023

View reviewed changes

Uh oh!

Add softmax_csr implementation #264

Add softmax_csr implementation #264

Uh oh!

Conversation

DamianSzwichtenberg commented Oct 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Oct 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

DamianSzwichtenberg commented Oct 12, 2023

Uh oh!

kgajdamo Oct 13, 2023

Choose a reason for hiding this comment

Uh oh!

DamianSzwichtenberg Oct 13, 2023

Choose a reason for hiding this comment

Uh oh!

DamianSzwichtenberg Nov 3, 2023

Choose a reason for hiding this comment

Uh oh!

DamianSzwichtenberg commented Nov 3, 2023

Uh oh!

yanbing-j commented Nov 7, 2023

Uh oh!

DamianSzwichtenberg commented Nov 7, 2023

Uh oh!

Uh oh!

kgajdamo left a comment

Choose a reason for hiding this comment

Uh oh!

rusty1s Nov 17, 2023

Choose a reason for hiding this comment

Uh oh!

DamianSzwichtenberg Nov 20, 2023

Choose a reason for hiding this comment

Uh oh!

DamianSzwichtenberg Nov 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add `softmax_csr` implementation #264

Add `softmax_csr` implementation #264

DamianSzwichtenberg commented Oct 12, 2023 •

edited

Loading

codecov bot commented Oct 12, 2023 •

edited

Loading

DamianSzwichtenberg Nov 20, 2023 •

edited

Loading