Bug Fixes

@kgajdamo

We are excited to announce the release of pyg-lib 0.5 🎉🎉🎉

What's Changed

Increment subgraph id globally for all seed nodes by @kgajdamo in #304
Fix tests on macOS ARM by @rusty1s in #305
fix bug to add large tensor support by @kaixuanliu in #308
Add macOS M1 support by @rusty1s in #310
Fix macOS nightly build by @rusty1s in #312
Fix macOS install (part2) by @rusty1s in #313
Build for Windows by @rusty1s in #315
Update Windows build by @rusty1s in #317
Add PyTorch 2.3 support by @rusty1s in #322
Load libpyg.so first to let torch.library.register_fake find custom operators by @akihironitta in #329
Ensure consistent line endings in the repository by @akihironitta in #332
Add PyTorch 2.4 support by @rusty1s in #338
Add PyTorch 2.4 support by @rusty1s in #339
Drop Windows/PyTorch 2.0.0/cu121 builds by @rusty1s in #340
Add sphinx-lint by @rusty1s in #341
Support for sphinx_copybutton by @rusty1s in #342
Autodocument type hints by @rusty1s in #343
Fix docs build in CI by @akihironitta in #351
Drop Python 3.8 support by @akihironitta in #356
Add PyTorch 2.5 support by @rusty1s in #360
Fix typo in README by @rusty1s in #362
Remove unused enumerate in fused_scatter_reduce by @akihironitta in #370
Remove usage of torch::autograd::Variable by @akihironitta in #369
Add cuCollections dependency by @rusty1s in #371
Limit concurrency for nightly jobs by @akihironitta in #372
Boilerplate for custom HashMap implementation by @rusty1s in #373
Basic implementation of CPUHashMap by @rusty1s in #375
Support various data types in CPUHashMap by @rusty1s in #376
Add asserts and default get to HashMap by @rusty1s in #377
Pybind support for CPUHashMap by @rusty1s in #378
CPUHashMap benchmark by @rusty1s in #379
Use multi-threading and parallel hash map in CPUHashMap by @rusty1s in #380
Implement dynamic polymorphism in CPUHashMap by @rusty1s in #381
Restructure file layout for HashMap - prepare CUDA version by @rusty1s in #382
Fix documentation builds by @rusty1s in #384
Update labeling procedure by @rusty1s in #385
Re-structure includes by @rusty1s in #388
Fix CI by @rusty1s in #390
Add CUDAHashMapImpl via cuCollections by @rusty1s in #391
Robustify CUDAHashMap implementation + add serialization by @rusty1s in #392
Align CPUHashMap and CUDAHashMap implementations by @rusty1s in #393
[Bug] Fixing bug in fused_scatter_reduce function. by @drivanov in #394
Introduce load_factor - use dtype.int by default in HashMap benchmark by @rusty1s in #396
Debug Windows build by @rusty1s in #397
Disable Windows support in CUDAHashMap by @rusty1s in #398
Support num_submaps in CPUHashMap by @rusty1s in #399
Remove unnecessary argsort in CUDAHashMap.keys() by @rusty1s in #400
Add tests for HashMap python bindings by @rusty1s in #401
Update benchmark script to respect num_submaps by @rusty1s in #402
Expose size, dtype and device in HashMap by @rusty1s in #403
CI: Auto-merge PRs from bots by @akihironitta in #405
Replace c10::optional with equivalent std::optional by @akihironitta in #406
NeighborSampler boilerplate by @rusty1s in #413
Revert: "Replace c10::optional with equivalent std::optional by @rusty1s in #416
Implement a basic hetero sampler in the class by @vid-koci in #415
Add MetapathTracker helper class to NeighborSampler by @vid-koci in #417
Oversample metapaths where number of samples was lower than expected by @vid-koci in #419
PyTorch 2.6 support by @rusty1s in #421
Fix SetDevice by @rusty1s in #422
Correctly set CUDA device by @rusty1s in #423
Add pyproject.toml by @rusty1s in #426
Move stylers/linters to pyproject.toml by @rusty1s in #428
Python 3.13 support by @rusty1s in #429
Replace random shuffle with a sort in MetapathAwareNeighborSampler by @vid-koci in #425
Bump CI's ubuntu to latest by @Kh4L in #432
Add CXX11 ABI build support by @Kh4L in #431
Only trigger automerge workflow on opening a PR by @akihironitta in #434
Add NO_METIS flag by @rusty1s in #436
Include Dispatch.h by @rusty1s in #437
Fix the NO_METIS env var by @vid-koci in #438
Add a header for the Neighbor sampler class by @vid-koci in #439
Additional 0 neighbor checks and a test by @vid-koci in #440
Fix CUDA architectures to build for in CI by @akihironitta in #443
ci: Tiny clean up by @akihironitta in #444
Avoid deprecated format in project.license field in pyproject.toml by @akihironitta in #446
Revert "Avoid deprecated format in project.license field in pyproject.toml (#446)" by @akihironitta in #449
Fix typo in README.md by @akihironitta in #450
Support PyTorch 2.7 and CUDA 12.8 by @akihironitta in #442
ci: Set up dependabot by @akihironitta in #451
Update README.md on PyTorch 2.7 and CUDA 12.8 support by @akihironitta in #456
Remove PyTorch 1.12 handling from CI by @akihironitta in #459
Build for all Python versions on Linux by @rusty1s in #465
Fix Windows build by @rusty1s in #466
ci: Cancel running jobs when a new commit is pushed to PR by @akihironitta in #469
ci: Prepare for decoupling build configs from workflows by @akihironitta in #468
ci: Add reusable workflow building Linux wheels by @akihironitta in #470
Fix labeler by @akihironitta in #472
Block commits to local master by @akihironitta in #473
ci: Remove unnecessary install.yml by @akihironitta in #474
ci: Remove unnecessary python_testing.yml by @akihironitta in #475
ci: Shrink config matrix to run on PRs by @akihironitta in #476
ci: Add reusable workflow building macOS wheels by @akihironitta in #471
ci: Add reusable workflow building Windows wheels by @akihironitta in #477
Trim macOS and Windows build matrix by @akihironitta in #482
Skip test cases on CPU Windows due to PyTorch 2.4.0 bug by @akihironitta in #483
Remove Python.h by @akihironitta in #462
Fix automerge workflow by @akihironitta in https://github.com/pyg-t...

@pmpalang

pyg-lib==0.4.0 brings PyTorch 2.2 support, distributed neighbor sampling, accelerated softmax operations, and edge-level temporal sampling support to PyG 🎉🎉🎉

Highlights

PyTorch 2.2 Support

pyg-lib==0.4.0 is fully compatible with PyTorch 2.2 (#294). To install for PyTorch 2.2, simply run

pip install pyg-lib -f https://data.pyg.org/whl/torch-2.2.0+${CUDA}.html

where ${CUDA} should be replaced by either cpu, cu118 or cu121

The following combinations are supported:

PyTorch 2.2	`cpu`	`cu118`	`cu121`
Linux	✅	✅	✅
macOS	✅

Older PyTorch versions like PyTorch 1.12, 1.13, 2.0.0 and 2.1.0 are still supported, and can be installed as described in our README.md.

Distributed Sampling

pyg-lib==0.4.0 integrates all the low-level code for performing distributed neighbor sampling as part of torch_geometric.distributed in PyG 2.5 (#246, #252, #253, #254).

Sparse Softmax Implementation

pyg-lib==0.4.0 supports a fast sparse softmax_csr implementation based on CSR input representation (#264, #282):

from pyg_lib.ops import softmax_csr

src = torch.randn(4, 4)
ptr = torch.tensor([0, 4])
out = softmax_csr(src, ptr)

Edge-level Temporal Sampling

pyg-lib==0.4.0 brings edge-level temporal sampling support to PyG (#280). In particular, neighbor_sample and hetero_neighbor_sample now support the edge_time attribute, which will only samples edges in case they have a lower or equal timestamp than their corresponding seed_time.

Additional Features

Added support for bfloat16 data type in segment_matmul and grouped_matmul on CPU (#272)
Improved the runtime of biased sampling in neighbor_sample and hetero_neighbor_sample (#270)

Bugfixes

Dropped the MKL code path in neighbor_sample and hetero_neighbor_sample with replace=False since it did not correctly prevent duplicates (#275)
Fixed grouped_matmul in case input tensors are not contiguous (#290)

New Contributors

@pmpalang made their first contribution in #280
@Jokeren made their first contribution in #290

Full Changelog: 0.3.0...0.4.0

pyg-lib==0.3.1 includes a variety of bugfixes and improvements.

Bug Fixes

Fixed an issue introduced in pyg-lib==0.3.0 in which the replace=False option was not correctly respected during neighbor_sample (#275)
Fixed support for older GLIBC versions (#276)

Improvements

Biased neighbor_sample has been made approximately twice as fast (#270)
segment_matmul and grouped_matmul now support bfloat16 CPU tensors (#271)

Full Changelog: 0.3.0...0.3.1

@yaox12

pyg-lib==0.3.0 brings PyTorch 2.1 support, METIS partioning and further neighbor sampling improvements to PyG 🎉🎉🎉

Highlights

PyTorch 2.1 Support

pyg-lib==0.3.0 is fully compatible with PyTorch 2.1 (#256). To install for PyTorch 2.1, simply run

pip install pyg-lib -f https://data.pyg.org/whl/torch-2.1.0+${CUDA}.html

where ${CUDA} should be replaced by either cpu, cu118 or cu121

The following combinations are supported:

PyTorch 2.1	`cpu`	`cu118`	`cu121`
Linux	✅	✅	✅
macOS	✅

Older PyTorch versions like PyTorch 1.12, 1.13 and 2.0.0 are still supported, and can be installed as described in our README.md. PyTorch 1.11 support has been dropped.

METIS partioning

pyg-lib==0.3.0 enables METIS partioning by introducing pyg_lib.partition (#229).

from pyg_lib.partition import metis

cluster = metis(rowptr, col, num_partitions)

Neighbor Sampling Improvements

pyg-lib==0.3.0 brings various improvements to our neighbor sampling routine:

Support for biased/weighted sampling: pyg_lib.sampler.neighbor_sample and pyg_lib.sampler.hetero_neighbor_sample now support the additional edge_weight argument (#247, #251)
pyg_lib.sampler.hetero_neighbor_sample now performs neighborhood sampling across edge types in parallel (#211)
Added low-level support for distributed neighborhood sampling (#246, #252, #253, #254)

Additional Features

Added dispatch for XPU device in index_sort (#243)
Updated cutlass version for speed boosts in segment_matmul and grouped_matmul (#235)

Bugfixes

Fixed vector-based mapping issue in Mapping (#244)
Fixed performance issues reported by Coverity Tool (#240)
Fixed TorchScript support in grouped_matmul (#220)

New Contributors

@yaox12 made their first contribution in #213
@yanbing-j made their first contribution in #231
@akihironitta made their first contribution in #248

Full Changelog: 0.2.0...0.3.0

pyg-lib==0.2.0 brings PyTorch 2.0 support, sampled operations and further accelerations to PyG 🎉🎉🎉

Highlights

PyTorch 2.0 Support

pyg-lib==0.2.0 is fully compatible with PyTorch 2.0. To install for PyTorch 2.0, simply run

pip install pyg-lib -f https://data.pyg.org/whl/torch-2.0.0+${CUDA}.html

where ${CUDA} should be replaced by either cpu, cu117 or cu118

The following combinations are supported:

PyTorch 2.0	`cpu`	`cu117`	`cu118`
Linux	✅	✅	✅
macOS	✅

Older PyTorch versions like PyTorch 1.11, 1.12 and 1.13 are still supported, and can be installed as described in our README.md.

Sampled Operations

We added support for sampled_op implementations (#156, #159, #160), which implements the scheme

out = left_tensor[left_index] (op) right_tensor[right_index]

efficiently without materializing intermediate representations:

from pyg_lib.ops import sampled_add

edge_index = ...
row, col = edge_index

# Replace ...
out = x[row] + x[col]

# ... with
out = sampled_add(left=x, right=x, left_index=row, right_index=col)

Supported operations are sampled_add, sampled_sub, sampled_mul and sampled_div.

Further Accelerations

index_sort implements a (way) faster alternative to sorting one-dimensional indices compared to torch.sort() (#181, #192). This heavily increases dataset loading times in PyG:

Optimized segment_matmul and grouped_matmul CPU implementations via MKL BLAS gemm_batch (#146, #172):

Breaking Changes

Temporal neighbor_sample and hetero_neighbor_sample will now sample nodes with the same or smaller timestamp than the seed node (changed from only sampling nodes with a smaller timestamp) (#187)

Full Changelog

Added

Added PyTorch 2.0 support (#214)
neighbor_sample routines now also return information about the number of sampled nodes/edges per layer (#197)
Added index_sort implementation (#181, #192)
Added triton>=2.0 support (#171)
Added bias term to grouped_matmul and segment_matmul (#161)
Added sampled_op implementation (#156, #159, #160)

Changed

Sample the nodes with the same timestamp as seed nodes (#187)
Added write-csv (saves benchmark results as csv file) and libraries (determines which libraries will be used in benchmark) parameters (#167)
Enable benchmarking of neighbor sampler on temporal graphs (#165)
Improved [segment|grouped]_matmul CPU implementation via at::matmul_out and MKL BLAS gemm_batch (#146, #172)

Full commit list: 0.1.0...0.2.0

We are proud to release pyg-lib==0.1.0, the first stable version of our new low-level Graph Neural Network library to drive all CPU and GPU acceleration needs of PyG 🎉🎉🎉

Extensive documentation is provided here. Once pyg-lib is installed, it will get automatically picked up by PyG, e.g., during neighborhood sampling or during heterogeneous GNN execution, and will accelerate its computation.

Installation

You can install pyg-lib as described in our README.md:

pip install pyg-lib -f https://data.pyg.org/whl/torch-${TORCH}+${CUDA}.html

where

${TORCH} should be replaced by either 1.11.0, 1.12.0 or 1.13.0
${CUDA} should be replaced by either cpu, cu102, cu113, cu115, cu116 or cu117

The following combinations are supported:

PyTorch 1.13	`cpu`	`cu116`	`cu117`
Linux	✅	✅	✅
Windows
macOS	✅

PyTorch 1.12	`cpu`	`cu102`	`cu113`	`cu116`
Linux	✅	✅	✅	✅
Windows
macOS	✅

PyTorch 1.11	`cpu`	`cu102`	`cu113`	`cu115`
Linux	✅	✅	✅	✅
Windows
macOS	✅

Highlights

`pyg_lib.sampler`: Optimized homogeneous and heterogeneous neighborhood sampling

pyg-lib provides fast and optimized CPU routines to iteratively sample neighbors in homogeneous and heterogeneous graphs, and heavily improves upon the previously used neighborhood sampling techniques utilized in PyG. For example, it pre-allocates random numbers, uses vector-based mapping for nodes in smaller node types, leverages a faster hashmap implementation, etc. Overall, it achieves speed-ups of about 10x-15x:

pyg_lib.sampler.neighbor_sample(
    rowptr: Tensor,
    col: Tensor,
    seed: Tensor,
    num_neighbors: List[int],
    time: Optional[Tensor] = None,
    seed_time: Optional[Tensor] = None,
    csc: bool = False,
    replace: bool = False,
    directed: bool = True,
    disjoint: bool = False,
    temporal_strategy: str = 'uniform',
    return_edge_id: bool = True,
)

and

pyg_lib.sampler.hetero_neighbor_sample(
    rowptr_dict: Dict[EdgeType, Tensor],
    col_dict: Dict[EdgeType, Tensor],
    seed_dict: Dict[NodeType, Tensor],
    num_neighbors_dict: Dict[EdgeType, List[int]],
    time_dict: Optional[Dict[NodeType, Tensor]] = None,
    seed_time_dict: Optional[Dict[NodeType, Tensor]] = None,
    csc: bool = False,
    replace: bool = False,
    directed: bool = True,
    disjoint: bool = False,
    temporal_strategy: str = 'uniform',
    return_edge_id: bool = True,
)

pyg_lib.sampler.neighbor_sample and pyg_lib.sampler.hetero_neighbor_sample recursively sample neighbors from all node indices in seed in the graph given by (rowptr, col). Also supports temporal sampling via the time argument, such that no nodes will be sampled that do not fulfill the temporal constraints as indicated by seed_time.

`pyg_lib.ops`: Heterogeneous GNN acceleration

pyg-lib provides efficient GPU-based routines to parallelize workloads in heterogeneous graphs across different node types and edge types. We achieve this by leveraging type-dependent transformations via NVIDIA CUTLASS integration, which is flexible to implement most heterogeneous GNNs with, and efficient, even for sparse edge types or a large number of different node types:

segment_matmul(inputs: Tensor, ptr: Tensor, other: Tensor) -> Tensor

pyg_lib.ops.segment_matmul performs dense-dense matrix multiplication according to segments along the first dimension of inputs as given by ptr.

inputs = torch.randn(8, 16)
ptr = torch.tensor([0, 5, 8])
other = torch.randn(2, 16, 32)

out = pyg_lib.ops.segment_matmul(inputs, ptr, other)
assert out.size() == (8, 32)
assert out[0:5] == inputs[0:5] @ other[0]
assert out[5:8] == inputs[5:8] @ other[1]

Full Changelog

Added

Added PyTorch 1.13 support (#145)
Added native PyTorch support for grouped_matmul (#137)
Added fused_scatter_reduce operation for multiple reductions (#141, #142)
Added triton dependency (#133, #134)
Enable pytest testing (#132)
Added C++-based autograd and TorchScript support for segment_matmul (#120, #122)
Allow overriding time for seed nodes via seed_time in neighbor_sample (#118)
Added [segment|grouped]_matmul CPU implementation (#111)
Added temporal_strategy option to neighbor_sample (#114)
Added benchmarking tool (Google Benchmark) along with pyg::sampler::Mapper benchmark example (#101)
Added CSC mode to pyg::sampler::neighbor_sample and pyg::sampler::hetero_neighbor_sample (#95, #96)
Speed up pyg::sampler::neighbor_sample via IndexTracker implementation (#84)
Added pyg::sampler::hetero_neighbor_sample implementation (#90, #92, #94, #97, #98, #99, #102, #110)
Added pyg::utils::to_vector implementation (#88)
Added support for PyTorch 1.12 (#57, #58)
Added grouped_matmul and segment_matmul CUDA implementations via cutlass (#51, #56, #61, #64, #69, #73, #123)
Added pyg::sampler::neighbor_sample implementation (#54, #76, #77, #78, #80, #81), #85, #86, #87, #89)
Added pyg::sampler::Mapper utility for mapping global to local node indices (#45, #83)
Added benchmark script (#45, #79, #82, #91, #93, #106)
Added download script for benchmark data (#44)
Added biased sampling utils (#38)
Added CHANGELOG.md (#39)
Added pyg.subgraph() (#31)
Added nightly builds ([#28](https://github.com...

Uh oh!

Releases: pyg-team/pyg-lib

pyg-lib 0.5.0

What's Changed

Contributors

Uh oh!

pyg-lib 0.4.0: PyTorch 2.2 support, distributed sampling, sparse softmax, edge-level temporal sampling

Highlights

PyTorch 2.2 Support

Distributed Sampling

Sparse Softmax Implementation

Edge-level Temporal Sampling

Additional Features

Bugfixes

New Contributors

Contributors

Uh oh!

pyg-lib 0.3.1: Bugfixes

Bug Fixes

Improvements

Uh oh!

pyg-lib 0.3.0: PyTorch 2.1 support, METIS partitioning, neighbor sampler improvements

Highlights

PyTorch 2.1 Support

METIS partioning

Neighbor Sampling Improvements

Additional Features

Bugfixes

New Contributors

Contributors

Uh oh!

pyg-lib 0.2.0: PyTorch 2.0 support, sampled operations, and further accelerations

Highlights

PyTorch 2.0 Support

Sampled Operations

Further Accelerations

Breaking Changes

Full Changelog

Uh oh!

pyg-lib 0.1.0: Optimized neighborhood sampling and heterogeneous GNN acceleration

Installation

Highlights

pyg_lib.sampler: Optimized homogeneous and heterogeneous neighborhood sampling

pyg_lib.ops: Heterogeneous GNN acceleration

Full Changelog

Uh oh!

`pyg_lib.sampler`: Optimized homogeneous and heterogeneous neighborhood sampling

`pyg_lib.ops`: Heterogeneous GNN acceleration