-
Notifications
You must be signed in to change notification settings - Fork 243
Fix binary I/O for graphs with deleted nodes #1385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Fix binary I/O for graphs with deleted nodes #1385
Conversation
Pull Request Test Coverage Report for Build 21291359119Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
73439b2 to
b06f7da
Compare
b06f7da to
e1c5cde
Compare
|
Hi @fabratu, I’m a bit puzzled by the CPython 3.13 failures. They don’t seem related to this PR (CPython 3.10 builds pass). The failure is in testEigenvectorsReverse, which checks a specific eigenvector entry. Since eigenvectors are only defined up to a scalar, this seems brittle and likely sensitive to SciPy / BLAS / compiler differences. I suspect this is an environment issue rather than a regression from this PR. Happy to adjust the test (e.g. residual check) or follow your preferred approach.. |
|
You are right, this is an environmental regression (also happening on I have not yet looked into the respective code; for Will also add a review shortly. |
Background
The NetworKit binary graph format (.nkb) is positional and node-ID based:
the reader decodes one entry per node ID u ∈ [0, header.nodes).
However, the writer previously serialized data only for existing nodes, which caused stream misalignment when graphs contained deleted nodes (non-continuous IDs). This could result in assertions and crashes when reading such files (see #1278).
Summary of changes
Use
upperNodeIdBound()instead ofnumberOfNodes()numberOfNodes()counts only existing nodes, but the binary format must cover the entire node-ID space, including holes. UsingupperNodeIdBound()ensures writer and reader agree on the node range.Iterate over node IDs instead of
G.forNodes(...)G.forNodes(...)skips deleted node IDs.All writer loops that emit per-node positional data must therefore iterate over node IDs and explicitly encode deleted nodes as zero-degree entries. This applies to:
Explicitly handle deleted nodes during chunk size computation
Even deleted nodes must contribute a degree entry (0) so that offset tables match the actual byte layout.
Without this, adjacency data is decoded at incorrect positions.
Fix inverted
DELETED_BITsemanticsThe previous implementation had inverted semantics for
DELETED_BITin both writer and reader:DELETED_BITfor existing nodesDELETED_BITwas not setThese two inversions accidentally canceled out for dense graphs, masking the bug. This PR fixes the semantics to match the
name and intended meaning:
DELETED_BITonly for deleted nodesDELETED_BITis setResult
testWriteReadNonContinuousandtestWriteReadNonContinuousDirectedmentioned in NetworkitBinaryReader/Writer + deleted nodes + deleted edges #1278.Fixes #1278