Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Ladybug-Memory/icebug-format

Repository files navigation

Icebug Format

Icebug is a standardized graph format designed for efficient graph data interchange. It comes in two flavours:

Format Storage Use case
icebug-disk Parquet files Object storage, persistence
icebug-memory Apache Arrow tables In-process, zero-copy access

Both represent directed graphs in CSR (Compressed Sparse Row) format, which enables fast adjacency-list traversal.


icebug-disk v1

CLI

Convert a DuckDB source database containing nodes_* / edges_* tables into Parquet files and a schema.cypher that a graph database can mount directly:

uv run icebug-format \
  --source-db examples/karate/duckdb/karate_random.duckdb \
  --schema examples/karate/duckdb/schema.cypher      // input schema for rel tables

Output structure

For each node table nodes_<name> and edge table edges_<name>, the following files/tables are produced:

Name Description
nodes_<name>.parquet Original node table with attributes
indices_<name>.parquet Target node for each edge, sorted by source (size E)
indptr_<name>.parquet Row-pointer array of size N+1
schema.cypher Cypher schema for mounting in a graph database

NOTE: Each parquet file stores icebug_disk_version in its metadata

Example

Starting from a demo-db.duckdb with nodes_user, nodes_city, edges_follows, and edges_livesin tables:

uv run icebug-format \
  --source-db demo-db.duckdb \
  --schema demo-db/schema.cypher

Verify the result with test_csr_duckdb.py:

uv run ./icebug-format/test_csr_duckdb.py --input demo-db_csr
Metadata: 7 nodes, 8 edges, directed=True

Node Tables:
Table: demo_nodes_user
(100, 'Adam', 30) ...

Edge Tables (reconstructed from CSR):
Table: follows (FROM user TO user)
(100, 250, 2020) ...

icebug-memory v1

Python API

Convert Arrow tables directly into an in-memory CSR graph

from icebug_format import IcebugMemGraph

# Directed heterogeneous graph (different node types on each end)
graph: IcebugMemGraph = IcebugMemGraph.from_arrow_tables(
    from_node_arrow_table=users,   # pa.Table, first column is the primary key
    rel_arrow_table=livesin,       # pa.Table with 'source' and 'target' columns
    to_node_arrow_table=cities,    # pa.Table, first column is the primary key
)

# Directed graph, or homogeneous graph with reverse edges added
graph: IcebugMemGraph = IcebugMemGraph.from_arrow_tables(
    from_node_arrow_table=users,   # pa.Table, first column is the primary key
    rel_arrow_table=follows,       # pa.Table with 'source' and 'target' columns
    add_reverse_edges=True,        # to_node_arrow_table must be omitted
)

# Node tables are passed through unchanged
graph.src    # pa.Table — source nodes
graph.dest   # pa.Table — destination nodes

# CSR adjacency structure
graph.indices  # pa.Table — 'target' column (+ any edge properties), sorted by source
graph.indptr   # pa.Table — 'ptr' column of length len(src) + 1

The rel_arrow_table source and target columns are resolved by name in priority order, with a positional fallback:

Role Accepted names (in order) Fallback
Source source, src, from 0th column
Target target, destination, dest, to 1st column

Any remaining columns are preserved as edge properties in graph.indices.

Use --add-reverse-edges in the CLI, or add_reverse_edges=True in the Python API, to emit a symmetric adjacency by adding reverse edges. For reverse-edge expansion, to_node_arrow_table must be omitted; the same node table is used for both sides of every edge.

Caveats

  • icebug-format will always output a directed graph
  • If an algorithm needs symmetric adjacency, pass --add-reverse-edges to the CLI or add_reverse_edges=True to the Python API. Reverse edges will be added automatically. Reverse-edge expansion is supported only for rel tables with the same node type on both ends.
  • Reverse-edge expansion is all or nothing for a conversion. If your graph mixes edge types that should be symmetric, such as friends, with edge types that should stay directed, such as follows, run separate conversions or add reverse edges before calling icebug-format; --add-reverse-edges cannot be applied selectively per edge type.

Further reading

Blog post: Graph Archiving with Apache GraphAR

About

A proposal for graph standardization

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages