Icebug is a standardized graph format designed for efficient graph data interchange. It comes in two flavours:
| Format | Storage | Use case |
|---|---|---|
| icebug-disk | Parquet files | Object storage, persistence |
| icebug-memory | Apache Arrow tables | In-process, zero-copy access |
Both represent directed graphs in CSR (Compressed Sparse Row) format, which enables fast adjacency-list traversal.
Convert a DuckDB source database containing nodes_* / edges_* tables into Parquet files and a schema.cypher that a graph database can mount directly:
uv run icebug-format \
--source-db examples/karate/duckdb/karate_random.duckdb \
--schema examples/karate/duckdb/schema.cypher // input schema for rel tablesFor each node table nodes_<name> and edge table edges_<name>, the following files/tables are produced:
| Name | Description |
|---|---|
nodes_<name>.parquet |
Original node table with attributes |
indices_<name>.parquet |
Target node for each edge, sorted by source (size E) |
indptr_<name>.parquet |
Row-pointer array of size N+1 |
schema.cypher |
Cypher schema for mounting in a graph database |
NOTE: Each parquet file stores icebug_disk_version in its metadata
Starting from a demo-db.duckdb with nodes_user, nodes_city, edges_follows, and edges_livesin tables:
uv run icebug-format \
--source-db demo-db.duckdb \
--schema demo-db/schema.cypherVerify the result with test_csr_duckdb.py:
uv run ./icebug-format/test_csr_duckdb.py --input demo-db_csrMetadata: 7 nodes, 8 edges, directed=True
Node Tables:
Table: demo_nodes_user
(100, 'Adam', 30) ...
Edge Tables (reconstructed from CSR):
Table: follows (FROM user TO user)
(100, 250, 2020) ...
Convert Arrow tables directly into an in-memory CSR graph
from icebug_format import IcebugMemGraph
# Directed heterogeneous graph (different node types on each end)
graph: IcebugMemGraph = IcebugMemGraph.from_arrow_tables(
from_node_arrow_table=users, # pa.Table, first column is the primary key
rel_arrow_table=livesin, # pa.Table with 'source' and 'target' columns
to_node_arrow_table=cities, # pa.Table, first column is the primary key
)
# Directed graph, or homogeneous graph with reverse edges added
graph: IcebugMemGraph = IcebugMemGraph.from_arrow_tables(
from_node_arrow_table=users, # pa.Table, first column is the primary key
rel_arrow_table=follows, # pa.Table with 'source' and 'target' columns
add_reverse_edges=True, # to_node_arrow_table must be omitted
)
# Node tables are passed through unchanged
graph.src # pa.Table — source nodes
graph.dest # pa.Table — destination nodes
# CSR adjacency structure
graph.indices # pa.Table — 'target' column (+ any edge properties), sorted by source
graph.indptr # pa.Table — 'ptr' column of length len(src) + 1The rel_arrow_table source and target columns are resolved by name in priority order, with a positional fallback:
| Role | Accepted names (in order) | Fallback |
|---|---|---|
| Source | source, src, from |
0th column |
| Target | target, destination, dest, to |
1st column |
Any remaining columns are preserved as edge properties in graph.indices.
Use --add-reverse-edges in the CLI, or add_reverse_edges=True in the Python API, to emit a symmetric adjacency by adding reverse edges. For reverse-edge expansion, to_node_arrow_table must be omitted; the same node table is used for both sides of every edge.
- icebug-format will always output a directed graph
- If an algorithm needs symmetric adjacency, pass
--add-reverse-edgesto the CLI oradd_reverse_edges=Trueto the Python API. Reverse edges will be added automatically. Reverse-edge expansion is supported only for rel tables with the same node type on both ends. - Reverse-edge expansion is all or nothing for a conversion. If your graph mixes edge types that should be symmetric, such as
friends, with edge types that should stay directed, such asfollows, run separate conversions or add reverse edges before calling icebug-format;--add-reverse-edgescannot be applied selectively per edge type.