Releases: pyg-team/pyg-lib
pyg-lib 0.5.0
We are excited to announce the release of pyg-lib 0.5 πππ
What's Changed
- Increment subgraph id globally for all seed nodes by @kgajdamo in #304
- Fix tests on macOS ARM by @rusty1s in #305
- fix bug to add large tensor support by @kaixuanliu in #308
- Add macOS M1 support by @rusty1s in #310
- Fix macOS nightly build by @rusty1s in #312
- Fix macOS install (part2) by @rusty1s in #313
- Build for Windows by @rusty1s in #315
- Update Windows build by @rusty1s in #317
- Add PyTorch 2.3 support by @rusty1s in #322
- Load
libpyg.sofirst to lettorch.library.register_fakefind custom operators by @akihironitta in #329 - Ensure consistent line endings in the repository by @akihironitta in #332
- Add PyTorch 2.4 support by @rusty1s in #338
- Add PyTorch 2.4 support by @rusty1s in #339
- Drop Windows/PyTorch 2.0.0/
cu121builds by @rusty1s in #340 - Add
sphinx-lintby @rusty1s in #341 - Support for
sphinx_copybuttonby @rusty1s in #342 - Autodocument type hints by @rusty1s in #343
- Fix docs build in CI by @akihironitta in #351
- Drop Python 3.8 support by @akihironitta in #356
- Add PyTorch 2.5 support by @rusty1s in #360
- Fix typo in
READMEby @rusty1s in #362 - Remove unused
enumerateinfused_scatter_reduceby @akihironitta in #370 - Remove usage of
torch::autograd::Variableby @akihironitta in #369 - Add
cuCollectionsdependency by @rusty1s in #371 - Limit concurrency for nightly jobs by @akihironitta in #372
- Boilerplate for custom
HashMapimplementation by @rusty1s in #373 - Basic implementation of
CPUHashMapby @rusty1s in #375 - Support various data types in
CPUHashMapby @rusty1s in #376 - Add asserts and default get to
HashMapby @rusty1s in #377 - Pybind support for
CPUHashMapby @rusty1s in #378 CPUHashMapbenchmark by @rusty1s in #379- Use multi-threading and parallel hash map in
CPUHashMapby @rusty1s in #380 - Implement dynamic polymorphism in
CPUHashMapby @rusty1s in #381 - Restructure file layout for
HashMap- prepare CUDA version by @rusty1s in #382 - Fix documentation builds by @rusty1s in #384
- Update labeling procedure by @rusty1s in #385
- Re-structure includes by @rusty1s in #388
- Fix CI by @rusty1s in #390
- Add
CUDAHashMapImplviacuCollectionsby @rusty1s in #391 - Robustify
CUDAHashMapimplementation + add serialization by @rusty1s in #392 - Align
CPUHashMapandCUDAHashMapimplementations by @rusty1s in #393 - [Bug] Fixing bug in
fused_scatter_reducefunction. by @drivanov in #394 - Introduce
load_factor- usedtype.intby default inHashMapbenchmark by @rusty1s in #396 - Debug Windows build by @rusty1s in #397
- Disable Windows support in
CUDAHashMapby @rusty1s in #398 - Support
num_submapsinCPUHashMapby @rusty1s in #399 - Remove unnecessary
argsortinCUDAHashMap.keys()by @rusty1s in #400 - Add tests for
HashMappython bindings by @rusty1s in #401 - Update benchmark script to respect
num_submapsby @rusty1s in #402 - Expose
size,dtypeanddeviceinHashMapby @rusty1s in #403 - CI: Auto-merge PRs from bots by @akihironitta in #405
- Replace
c10::optionalwith equivalentstd::optionalby @akihironitta in #406 NeighborSamplerboilerplate by @rusty1s in #413- Revert: "Replace
c10::optionalwith equivalentstd::optionalby @rusty1s in #416 - Implement a basic hetero sampler in the class by @vid-koci in #415
- Add
MetapathTrackerhelper class toNeighborSamplerby @vid-koci in #417 - Oversample metapaths where number of samples was lower than expected by @vid-koci in #419
- PyTorch 2.6 support by @rusty1s in #421
- Fix
SetDeviceby @rusty1s in #422 - Correctly set CUDA device by @rusty1s in #423
- Add
pyproject.tomlby @rusty1s in #426 - Move stylers/linters to
pyproject.tomlby @rusty1s in #428 - Python 3.13 support by @rusty1s in #429
- Replace random shuffle with a sort in
MetapathAwareNeighborSamplerby @vid-koci in #425 - Bump CI's ubuntu to latest by @Kh4L in #432
- Add CXX11 ABI build support by @Kh4L in #431
- Only trigger automerge workflow on opening a PR by @akihironitta in #434
- Add
NO_METISflag by @rusty1s in #436 - Include
Dispatch.hby @rusty1s in #437 - Fix the NO_METIS env var by @vid-koci in #438
- Add a header for the Neighbor sampler class by @vid-koci in #439
- Additional 0 neighbor checks and a test by @vid-koci in #440
- Fix CUDA architectures to build for in CI by @akihironitta in #443
- ci: Tiny clean up by @akihironitta in #444
- Avoid deprecated format in
project.licensefield inpyproject.tomlby @akihironitta in #446 - Revert "Avoid deprecated format in
project.licensefield inpyproject.toml(#446)" by @akihironitta in #449 - Fix typo in README.md by @akihironitta in #450
- Support PyTorch 2.7 and CUDA 12.8 by @akihironitta in #442
- ci: Set up dependabot by @akihironitta in #451
- Update
README.mdon PyTorch 2.7 and CUDA 12.8 support by @akihironitta in #456 - Remove PyTorch 1.12 handling from CI by @akihironitta in #459
- Build for all Python versions on Linux by @rusty1s in #465
- Fix Windows build by @rusty1s in #466
- ci: Cancel running jobs when a new commit is pushed to PR by @akihironitta in #469
- ci: Prepare for decoupling build configs from workflows by @akihironitta in #468
- ci: Add reusable workflow building Linux wheels by @akihironitta in #470
- Fix labeler by @akihironitta in #472
- Block commits to local master by @akihironitta in #473
- ci: Remove unnecessary
install.ymlby @akihironitta in #474 - ci: Remove unnecessary
python_testing.ymlby @akihironitta in #475 - ci: Shrink config matrix to run on PRs by @akihironitta in #476
- ci: Add reusable workflow building macOS wheels by @akihironitta in #471
- ci: Add reusable workflow building Windows wheels by @akihironitta in #477
- Trim macOS and Windows build matrix by @akihironitta in #482
- Skip test cases on CPU Windows due to PyTorch 2.4.0 bug by @akihironitta in #483
- Remove
Python.hby @akihironitta in #462 - Fix automerge workflow by @akihironitta in https://github.com/pyg-t...
pyg-lib 0.4.0: PyTorch 2.2 support, distributed sampling, sparse softmax, edge-level temporal sampling
pyg-lib==0.4.0 brings PyTorch 2.2 support, distributed neighbor sampling, accelerated softmax operations, and edge-level temporal sampling support to PyG πππ
Highlights
PyTorch 2.2 Support
pyg-lib==0.4.0 is fully compatible with PyTorch 2.2 (#294). To install for PyTorch 2.2, simply run
pip install pyg-lib -f https://data.pyg.org/whl/torch-2.2.0+${CUDA}.html
where ${CUDA} should be replaced by either cpu, cu118 or cu121
The following combinations are supported:
| PyTorch 2.2 | cpu |
cu118 |
cu121 |
|---|---|---|---|
| Linux | β | β | β |
| macOS | β |
Older PyTorch versions like PyTorch 1.12, 1.13, 2.0.0 and 2.1.0 are still supported, and can be installed as described in our README.md.
Distributed Sampling
pyg-lib==0.4.0 integrates all the low-level code for performing distributed neighbor sampling as part of torch_geometric.distributed in PyG 2.5 (#246, #252, #253, #254).
Sparse Softmax Implementation
pyg-lib==0.4.0 supports a fast sparse softmax_csr implementation based on CSR input representation (#264, #282):
from pyg_lib.ops import softmax_csr
src = torch.randn(4, 4)
ptr = torch.tensor([0, 4])
out = softmax_csr(src, ptr)Edge-level Temporal Sampling
pyg-lib==0.4.0 brings edge-level temporal sampling support to PyG (#280). In particular, neighbor_sample and hetero_neighbor_sample now support the edge_time attribute, which will only samples edges in case they have a lower or equal timestamp than their corresponding seed_time.
Additional Features
- Added support for
bfloat16data type insegment_matmulandgrouped_matmulon CPU (#272) - Improved the runtime of biased sampling in
neighbor_sampleandhetero_neighbor_sample(#270)
Bugfixes
- Dropped the MKL code path in
neighbor_sampleandhetero_neighbor_samplewithreplace=Falsesince it did not correctly prevent duplicates (#275) - Fixed
grouped_matmulin case input tensors are not contiguous (#290)
New Contributors
Full Changelog: 0.3.0...0.4.0
pyg-lib 0.3.1: Bugfixes
pyg-lib==0.3.1 includes a variety of bugfixes and improvements.
Bug Fixes
- Fixed an issue introduced in
pyg-lib==0.3.0in which thereplace=Falseoption was not correctly respected duringneighbor_sample(#275) - Fixed support for older
GLIBCversions (#276)
Improvements
- Biased
neighbor_samplehas been made approximately twice as fast (#270) segment_matmulandgrouped_matmulnow supportbfloat16CPU tensors (#271)
Full Changelog: 0.3.0...0.3.1
pyg-lib 0.3.0: PyTorch 2.1 support, METIS partitioning, neighbor sampler improvements
pyg-lib==0.3.0 brings PyTorch 2.1 support, METIS partioning and further neighbor sampling improvements to PyG πππ
Highlights
PyTorch 2.1 Support
pyg-lib==0.3.0 is fully compatible with PyTorch 2.1 (#256). To install for PyTorch 2.1, simply run
pip install pyg-lib -f https://data.pyg.org/whl/torch-2.1.0+${CUDA}.html
where ${CUDA} should be replaced by either cpu, cu118 or cu121
The following combinations are supported:
| PyTorch 2.1 | cpu |
cu118 |
cu121 |
|---|---|---|---|
| Linux | β | β | β |
| macOS | β |
Older PyTorch versions like PyTorch 1.12, 1.13 and 2.0.0 are still supported, and can be installed as described in our README.md. PyTorch 1.11 support has been dropped.
METIS partioning
pyg-lib==0.3.0 enables METIS partioning by introducing pyg_lib.partition (#229).
from pyg_lib.partition import metis
cluster = metis(rowptr, col, num_partitions)Neighbor Sampling Improvements
pyg-lib==0.3.0 brings various improvements to our neighbor sampling routine:
- Support for biased/weighted sampling:
pyg_lib.sampler.neighbor_sampleandpyg_lib.sampler.hetero_neighbor_samplenow support the additionaledge_weightargument (#247, #251) pyg_lib.sampler.hetero_neighbor_samplenow performs neighborhood sampling across edge types in parallel (#211)- Added low-level support for distributed neighborhood sampling (#246, #252, #253, #254)
Additional Features
- Added dispatch for XPU device in
index_sort(#243) - Updated
cutlassversion for speed boosts insegment_matmulandgrouped_matmul(#235)
Bugfixes
- Fixed vector-based mapping issue in
Mapping(#244) - Fixed performance issues reported by Coverity Tool (#240)
- Fixed TorchScript support in
grouped_matmul(#220)
New Contributors
- @yaox12 made their first contribution in #213
- @yanbing-j made their first contribution in #231
- @akihironitta made their first contribution in #248
Full Changelog: 0.2.0...0.3.0
pyg-lib 0.2.0: PyTorch 2.0 support, sampled operations, and further accelerations
pyg-lib==0.2.0 brings PyTorch 2.0 support, sampled operations and further accelerations to PyG πππ
Highlights
PyTorch 2.0 Support
pyg-lib==0.2.0 is fully compatible with PyTorch 2.0. To install for PyTorch 2.0, simply run
pip install pyg-lib -f https://data.pyg.org/whl/torch-2.0.0+${CUDA}.html
where ${CUDA} should be replaced by either cpu, cu117 or cu118
The following combinations are supported:
| PyTorch 2.0 | cpu |
cu117 |
cu118 |
|---|---|---|---|
| Linux | β | β | β |
| macOS | β |
Older PyTorch versions like PyTorch 1.11, 1.12 and 1.13 are still supported, and can be installed as described in our README.md.
Sampled Operations
We added support for sampled_op implementations (#156, #159, #160), which implements the scheme
out = left_tensor[left_index] (op) right_tensor[right_index]efficiently without materializing intermediate representations:
from pyg_lib.ops import sampled_add
edge_index = ...
row, col = edge_index
# Replace ...
out = x[row] + x[col]
# ... with
out = sampled_add(left=x, right=x, left_index=row, right_index=col)Supported operations are sampled_add, sampled_sub, sampled_mul and sampled_div.
Further Accelerations
index_sortimplements a (way) faster alternative to sorting one-dimensional indices compared totorch.sort()(#181, #192). This heavily increases dataset loading times in PyG:
- Optimized
segment_matmulandgrouped_matmulCPU implementations via MKL BLASgemm_batch(#146, #172):
Breaking Changes
- Temporal
neighbor_sampleandhetero_neighbor_samplewill now sample nodes with the same or smaller timestamp than the seed node (changed from only sampling nodes with a smaller timestamp) (#187)
Full Changelog
Added
- Added PyTorch 2.0 support (#214)
neighbor_sampleroutines now also return information about the number of sampled nodes/edges per layer (#197)- Added
index_sortimplementation (#181, #192) - Added
triton>=2.0support (#171) - Added
biasterm togrouped_matmulandsegment_matmul(#161) - Added
sampled_opimplementation (#156, #159, #160)
Changed
- Sample the nodes with the same timestamp as seed nodes (#187)
- Added
write-csv(saves benchmark results as csv file) andlibraries(determines which libraries will be used in benchmark) parameters (#167) - Enable benchmarking of neighbor sampler on temporal graphs (#165)
- Improved
[segment|grouped]_matmulCPU implementation viaat::matmul_outand MKL BLASgemm_batch(#146, #172)
Full commit list: 0.1.0...0.2.0
pyg-lib 0.1.0: Optimized neighborhood sampling and heterogeneous GNN acceleration
We are proud to release pyg-lib==0.1.0, the first stable version of our new low-level Graph Neural Network library to drive all CPU and GPU acceleration needs of PyG πππ
Extensive documentation is provided here. Once pyg-lib is installed, it will get automatically picked up by PyG, e.g., during neighborhood sampling or during heterogeneous GNN execution, and will accelerate its computation.
Installation
You can install pyg-lib as described in our README.md:
pip install pyg-lib -f https://data.pyg.org/whl/torch-${TORCH}+${CUDA}.html
where
${TORCH}should be replaced by either1.11.0,1.12.0or1.13.0${CUDA}should be replaced by eithercpu,cu102,cu113,cu115,cu116orcu117
The following combinations are supported:
| PyTorch 1.13 | cpu |
cu102 |
cu113 |
cu115 |
cu116 |
cu117 |
|---|---|---|---|---|---|---|
| Linux | β | β | β | |||
| Windows | ||||||
| macOS | β |
| PyTorch 1.12 | cpu |
cu102 |
cu113 |
cu115 |
cu116 |
cu117 |
|---|---|---|---|---|---|---|
| Linux | β | β | β | β | ||
| Windows | ||||||
| macOS | β |
| PyTorch 1.11 | cpu |
cu102 |
cu113 |
cu115 |
cu116 |
cu117 |
|---|---|---|---|---|---|---|
| Linux | β | β | β | β | ||
| Windows | ||||||
| macOS | β |
Highlights
pyg_lib.sampler: Optimized homogeneous and heterogeneous neighborhood sampling
pyg-lib provides fast and optimized CPU routines to iteratively sample neighbors in homogeneous and heterogeneous graphs, and heavily improves upon the previously used neighborhood sampling techniques utilized in PyG. For example, it pre-allocates random numbers, uses vector-based mapping for nodes in smaller node types, leverages a faster hashmap implementation, etc. Overall, it achieves speed-ups of about 10x-15x:
pyg_lib.sampler.neighbor_sample(
rowptr: Tensor,
col: Tensor,
seed: Tensor,
num_neighbors: List[int],
time: Optional[Tensor] = None,
seed_time: Optional[Tensor] = None,
csc: bool = False,
replace: bool = False,
directed: bool = True,
disjoint: bool = False,
temporal_strategy: str = 'uniform',
return_edge_id: bool = True,
)and
pyg_lib.sampler.hetero_neighbor_sample(
rowptr_dict: Dict[EdgeType, Tensor],
col_dict: Dict[EdgeType, Tensor],
seed_dict: Dict[NodeType, Tensor],
num_neighbors_dict: Dict[EdgeType, List[int]],
time_dict: Optional[Dict[NodeType, Tensor]] = None,
seed_time_dict: Optional[Dict[NodeType, Tensor]] = None,
csc: bool = False,
replace: bool = False,
directed: bool = True,
disjoint: bool = False,
temporal_strategy: str = 'uniform',
return_edge_id: bool = True,
)pyg_lib.sampler.neighbor_sample and pyg_lib.sampler.hetero_neighbor_sample recursively sample neighbors from all node indices in seed in the graph given by (rowptr, col). Also supports temporal sampling via the time argument, such that no nodes will be sampled that do not fulfill the temporal constraints as indicated by seed_time.
pyg_lib.ops: Heterogeneous GNN acceleration
pyg-lib provides efficient GPU-based routines to parallelize workloads in heterogeneous graphs across different node types and edge types. We achieve this by leveraging type-dependent transformations via NVIDIA CUTLASS integration, which is flexible to implement most heterogeneous GNNs with, and efficient, even for sparse edge types or a large number of different node types:
segment_matmul(inputs: Tensor, ptr: Tensor, other: Tensor) -> Tensorpyg_lib.ops.segment_matmul performs dense-dense matrix multiplication according to segments along the first dimension of inputs as given by ptr.
inputs = torch.randn(8, 16)
ptr = torch.tensor([0, 5, 8])
other = torch.randn(2, 16, 32)
out = pyg_lib.ops.segment_matmul(inputs, ptr, other)
assert out.size() == (8, 32)
assert out[0:5] == inputs[0:5] @ other[0]
assert out[5:8] == inputs[5:8] @ other[1]Full Changelog
Added
- Added PyTorch 1.13 support (#145)
- Added native PyTorch support for
grouped_matmul(#137) - Added
fused_scatter_reduceoperation for multiple reductions (#141, #142) - Added
tritondependency (#133, #134) - Enable
pytesttesting (#132) - Added C++-based autograd and TorchScript support for
segment_matmul(#120, #122) - Allow overriding
timefor seed nodes viaseed_timeinneighbor_sample(#118) - Added
[segment|grouped]_matmulCPU implementation (#111) - Added
temporal_strategyoption toneighbor_sample(#114) - Added benchmarking tool (Google Benchmark) along with
pyg::sampler::Mapperbenchmark example (#101) - Added CSC mode to
pyg::sampler::neighbor_sampleandpyg::sampler::hetero_neighbor_sample(#95, #96) - Speed up
pyg::sampler::neighbor_sampleviaIndexTrackerimplementation (#84) - Added
pyg::sampler::hetero_neighbor_sampleimplementation (#90, #92, #94, #97, #98, #99, #102, #110) - Added
pyg::utils::to_vectorimplementation (#88) - Added support for PyTorch 1.12 (#57, #58)
- Added
grouped_matmulandsegment_matmulCUDA implementations viacutlass(#51, #56, #61, #64, #69, #73, #123) - Added
pyg::sampler::neighbor_sampleimplementation (#54, #76, #77, #78, #80, #81), #85, #86, #87, #89) - Added
pyg::sampler::Mapperutility for mapping global to local node indices (#45, #83) - Added benchmark script (#45, #79, #82, #91, #93, #106)
- Added download script for benchmark data (#44)
- Added
biased samplingutils (#38) - Added
CHANGELOG.md(#39) - Added
pyg.subgraph()(#31) - Added nightly builds ([#28](https://github.com...