Replies: 2 comments
-
|
Happy to answer any questions!! |
Beta Was this translation helpful? Give feedback.
0 replies
-
|
@safishamsi just following up on the proposal review. I was wondering if you've had a chance to look at it |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Ladybug is a light-weight embedded graph database optimised for graph workloads
icebug is a high-performance open-source library, engineered for parallel processing and large-scale graph analysis. When paired with arrow-backed memory, it is 10x-100x faster than networkit / networkX
Proposal
Currently graphify is not particularly efficient in terms of storage and performance. Replacing it with ladybug + icebug would provide substantial improvements in storage, scalability and performance
Small graphs (in-memory)
Ladybug can be used instead of in-memory nxGraph & on-disk json and icebug for running complex algos like community detection. This enables faster
build,serveand efficient storageBuild
Post extraction,
IMPORT DATABASEcmdexport_as_arrow_csr(which converts ladybug tables to arrow csr tables) which can then be used by icebug to run community detectionladybug(algo ext or cypher queries can be used) oricebug(for faster parallel processing) can be used to run different algosEXPORT DATABASEcmd in ladybugPS: updates can be written to graph using ladybug instance during any of the phases
Serve
All the
serverequests can be processed in ladybug using cypher queries efficientlyLarge graphs
For large graphs which cannot fit in memory, an on-disk ladybug instance can be created (
new Database('<path>/graph.lbdb')). Rest of the process is same except for importing / exporting graphs. Running explicit import and export cmds are not reqd as all the writes are persisted to a.lbdbfile on diskAnalysis
As Arrow and Parquet are compressed columnar storage formats, memory and storage foot print would be a lot lesser
On graphify run on ladybug repo:
which is a 13x reduction
Queries and algo runs would be certainly faster as ladybug and icebug are optimised for larger graphs. Take a look at the following benchmarks:
References
Beta Was this translation helpful? Give feedback.
All reactions