Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@Schwarf
Copy link
Contributor

@Schwarf Schwarf commented Jan 10, 2026

Background

The NetworKit binary graph format (.nkb) is positional and node-ID based:
the reader decodes one entry per node ID u ∈ [0, header.nodes).
However, the writer previously serialized data only for existing nodes, which caused stream misalignment when graphs contained deleted nodes (non-continuous IDs). This could result in assertions and crashes when reading such files (see #1278).

Summary of changes

  1. Use upperNodeIdBound() instead of numberOfNodes()
    numberOfNodes() counts only existing nodes, but the binary format must cover the entire node-ID space, including holes. Using upperNodeIdBound() ensures writer and reader agree on the node range.

  2. Iterate over node IDs instead of G.forNodes(...)
    G.forNodes(...) skips deleted node IDs.
    All writer loops that emit per-node positional data must therefore iterate over node IDs and explicitly encode deleted nodes as zero-degree entries. This applies to:

    • base node flags
    • adjacency lists
    • transpose lists
    • weights
    • edge IDs
  3. Explicitly handle deleted nodes during chunk size computation
    Even deleted nodes must contribute a degree entry (0) so that offset tables match the actual byte layout.
    Without this, adjacency data is decoded at incorrect positions.

  4. Fix inverted DELETED_BIT semantics
    The previous implementation had inverted semantics for DELETED_BIT in both writer and reader:

    • the writer set DELETED_BIT for existing nodes
    • the reader removed nodes when DELETED_BIT was not set
      These two inversions accidentally canceled out for dense graphs, masking the bug. This PR fixes the semantics to match the
      name and intended meaning:
    • the writer sets DELETED_BIT only for deleted nodes
    • the reader removes nodes when DELETED_BIT is set

Result

Fixes #1278

@Schwarf Schwarf changed the title Fix binary I/O for graphs with deleted nodes (sparse node IDs) Fix binary I/O for graphs with deleted nodes Jan 10, 2026
@coveralls
Copy link

coveralls commented Jan 10, 2026

Pull Request Test Coverage Report for Build 21291359119

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 39 of 80 (48.75%) changed or added relevant lines in 2 files are covered.
  • 2 unchanged lines in 1 file lost coverage.
  • Overall coverage decreased (-0.04%) to 79.373%

Changes Missing Coverage Covered Lines Changed/Added Lines %
networkit/cpp/io/NetworkitBinaryReader.cpp 12 53 22.64%
Files with Coverage Reduction New Missed Lines %
networkit/flow.pyx 2 95.12%
Totals Coverage Status
Change from base Build 20781110983: -0.04%
Covered Lines: 29538
Relevant Lines: 37214

💛 - Coveralls

@Schwarf Schwarf force-pushed the fix/binary_reader_writer_deleted_nodes branch 2 times, most recently from 73439b2 to b06f7da Compare January 11, 2026 10:12
@Schwarf Schwarf force-pushed the fix/binary_reader_writer_deleted_nodes branch from b06f7da to e1c5cde Compare January 11, 2026 10:46
@Schwarf
Copy link
Contributor Author

Schwarf commented Jan 11, 2026

Hi @fabratu,

I’m a bit puzzled by the CPython 3.13 failures. They don’t seem related to this PR (CPython 3.10 builds pass).

The failure is in testEigenvectorsReverse, which checks a specific eigenvector entry. Since eigenvectors are only defined up to a scalar, this seems brittle and likely sensitive to SciPy / BLAS / compiler differences.

I suspect this is an environment issue rather than a regression from this PR. Happy to adjust the test (e.g. residual check) or follow your preferred approach..

@fabratu fabratu added the bug label Jan 13, 2026
@fabratu
Copy link
Member

fabratu commented Jan 13, 2026

You are right, this is an environmental regression (also happening on master). It appears that the newly released SciPy 1.17.0 computes the eigenvalue / eigenvector to zero for our test matrix. Hence, the result is deviating too much from the expected result.

I have not yet looked into the respective code; for 1.16.2 (and below), we get the correct answer. I have filed a bug report and will open a PR for a temp. pin to <1.17.0 for SciPy.

Will also add a review shortly.

@fabratu fabratu self-assigned this Jan 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

NetworkitBinaryReader/Writer + deleted nodes + deleted edges

3 participants