Enhancement: replace networkx + json with ladybug + icebug #1339

aheev · 2026-06-16T13:20:08Z

aheev
Jun 16, 2026

Ladybug is a light-weight embedded graph database optimised for graph workloads

icebug is a high-performance open-source library, engineered for parallel processing and large-scale graph analysis. When paired with arrow-backed memory, it is 10x-100x faster than networkit / networkX

Proposal

Currently graphify is not particularly efficient in terms of storage and performance. Replacing it with ladybug + icebug would provide substantial improvements in storage, scalability and performance

Small graphs (in-memory)

Ladybug can be used instead of in-memory nxGraph & on-disk json and icebug for running complex algos like community detection. This enables faster build, serve and efficient storage

Build

Post extraction,

An in-memory ladybug instance is created with node and rel tables (schema enforced). For incremental builds, prev exported graph can be imported using IMPORT DATABASE cmd
nodes and rels from extraction phase can be directly written to the lbug instance
For clustering, ladybug offers export_as_arrow_csr(which converts ladybug tables to arrow csr tables) which can then be used by icebug to run community detection
During dedup or analysis phase either ladybug (algo ext or cypher queries can be used) or icebug (for faster parallel processing) can be used to run different algos
Finally the graph can be persisted to disk in space-efficient parquet format using EXPORT DATABASE cmd in ladybug

PS: updates can be written to graph using ladybug instance during any of the phases

Serve

All the serve requests can be processed in ladybug using cypher queries efficiently

Large graphs

For large graphs which cannot fit in memory, an on-disk ladybug instance can be created (new Database('<path>/graph.lbdb')). Rest of the process is same except for importing / exporting graphs. Running explicit import and export cmds are not reqd as all the writes are persisted to a .lbdb file on disk

Analysis

As Arrow and Parquet are compressed columnar storage formats, memory and storage foot print would be a lot lesser

On graphify run on ladybug repo:

graph.json : ~33 MB
ladybug-export/ : ~2.5 MB

211 Jun 16 15:58 copy.cypher
1069880 Jun 16 15:58 edges_nodes_nodes.parquet
1522237 Jun 16 15:58 nodes.parquet
455 Jun 16 15:58 schema.cypher

which is a 13x reduction

Queries and algo runs would be certainly faster as ladybug and icebug are optimised for larger graphs. Take a look at the following benchmarks:

References

minimal notebook for a complete ladybug+icebug demonstration
ladybug docs

aheev · 2026-06-16T13:20:20Z

aheev
Jun 16, 2026
Author

Happy to answer any questions!!

0 replies

aheev · 2026-06-19T05:40:42Z

aheev
Jun 19, 2026
Author

@safishamsi just following up on the proposal review. I was wondering if you've had a chance to look at it

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enhancement: replace networkx + json with ladybug + icebug #1339

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Enhancement: replace networkx + json with ladybug + icebug #1339

Uh oh!

Uh oh!

aheev Jun 16, 2026

Proposal

Small graphs (in-memory)

Build

Serve

Large graphs

Analysis

References

Replies: 2 comments

Uh oh!

aheev Jun 16, 2026 Author

Uh oh!

aheev Jun 19, 2026 Author

aheev
Jun 16, 2026

aheev
Jun 16, 2026
Author

aheev
Jun 19, 2026
Author