Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@kgajdamo
Copy link
Contributor

  • Added hetero neighbor sampler benchmark to pyg-lib.
  • Benchmark measures performance for the hetero_neighbor_sample from pyg-lib as well as hetero_neighbor_sample from pytorch_sparse.

@kgajdamo kgajdamo requested a review from rusty1s September 19, 2022 13:11

path = osp.join(osp.dirname(osp.realpath(__file__)), '../../data/OGB')
transform = T.ToUndirected(merge=True)
dataset = OGB_MAG(path, preprocess='metapath2vec', transform=transform)
Copy link
Member

@rusty1s rusty1s Sep 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can move this to pyg_lib.testing.withDataset. WDYT? And then just have it return a dictionary of (rowptr, col) entries?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll try

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the code so that the dataset can be retrieved using the decorator. But for the purposes of the benchmark, it is necessary not only to return the (rowptr_dict, col_dict) but also the number of nodes, edge types and node types. Please see if such an implementation suits you.

In addition, I noticed that sampling with replacement is not working properly for hetero sample in pyg-lib.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Can you clarify what is not working for sampling with replacement? Does it simply crash?

Copy link
Contributor Author

@kgajdamo kgajdamo Sep 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It crashes in uniform_sample() -> add()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, will take a look!

@codecov-commenter
Copy link

codecov-commenter commented Sep 22, 2022

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.32%. Comparing base (38ba009) to head (b7a9105).
Report is 178 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #106   +/-   ##
=======================================
  Coverage   89.32%   89.32%           
=======================================
  Files          16       16           
  Lines         412      412           
=======================================
  Hits          368      368           
  Misses         44       44           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@kgajdamo kgajdamo force-pushed the hetero_neighbor_bench branch from 596e57a to f904d98 Compare September 22, 2022 13:18
@rusty1s
Copy link
Member

rusty1s commented Sep 23, 2022

This is great, thank you very much!

It's interesting that torch-sparse performs better on [-1] neighborhoods. Wondering why this is the case.

@kgajdamo
Copy link
Contributor Author

This is great, thank you very much!

It's interesting that torch-sparse performs better on [-1] neighborhoods. Wondering why this is the case.

Thanks @rusty1s for the updates. I have a question about your changes. In the hetero_neighbor.py file for the pytorch sampler case You put some declarations like node_types, edge_types etc inside the loop. These declarations make the measurement time greater and probably there is no need to declare them every time. Was it on purpose?
Another question is should I add dgl hetero neighbor sampler to the script?
And also I would like to ask why we take the matrix in csr format but the variable names are colptr and row?

@kgajdamo
Copy link
Contributor Author

This is great, thank you very much!

It's interesting that torch-sparse performs better on [-1] neighborhoods. Wondering why this is the case.

Yes, pytorch_sparse sampler has better time when sample all one hop neighbors. Out of curiosity, I checked how it would look if num_neighbors=[-1, -1], and for this case pyg-lib is ~2 times faster. Here are results:
Screenshot 2022-09-23 at 16 38 23

@rusty1s
Copy link
Member

rusty1s commented Sep 25, 2022

Was it on purpose?

Yes, IMO that makes the comparison more fair as our Python wrapper around pyg-libs sample code does the same as well.

Another question is should I add dgl hetero neighbor sampler to the script?

Yes, this would be very valuable.

And also I would like to ask why we take the matrix in csr format but the variable names are colptr and row?

We need to use CSC here since that is the only format torch-sparse supports. Note that we take the CSR format of the transposed adjacency matrix with corresponds to CSC.

@kgajdamo
Copy link
Contributor Author

Was it on purpose?

Yes, IMO that makes the comparison more fair as our Python wrapper around pyg-libs sample code does the same as well.

Another question is should I add dgl hetero neighbor sampler to the script?

Yes, this would be very valuable.

And also I would like to ask why we take the matrix in csr format but the variable names are colptr and row?

We need to use CSC here since that is the only format torch-sparse supports. Note that we take the CSR format of the transposed adjacency matrix with corresponds to CSC.

Right, thank You for the detailed answer :)

@kgajdamo kgajdamo deleted the hetero_neighbor_bench branch November 3, 2023 07:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants