Icebug Format

Icebug is a standardized graph format designed for efficient graph data interchange. It comes in two flavours:

Format	Storage	Use case
icebug-disk	Parquet files	Object storage, persistence
icebug-memory	Apache Arrow tables	In-process, zero-copy access

Both represent directed graphs in CSR (Compressed Sparse Row) format, which enables fast adjacency-list traversal.

icebug-disk v1

CLI

Convert a DuckDB source database containing nodes_* / edges_* tables into Parquet files and a schema.cypher that a graph database can mount directly:

uv run icebug-format \
  --source-db examples/karate/duckdb/karate_random.duckdb \
  --schema examples/karate/duckdb/schema.cypher      // input schema for rel tables

Output structure

For each node table nodes_<name> and edge table edges_<name>, the following files/tables are produced:

Name	Description
`nodes_<name>.parquet`	Original node table with attributes
`indices_<name>.parquet`	Target node for each edge, sorted by source (size E)
`indptr_<name>.parquet`	Row-pointer array of size N+1
`schema.cypher`	Cypher schema for mounting in a graph database

NOTE: Each parquet file stores icebug_disk_version in its metadata

Example

Starting from a demo-db.duckdb with nodes_user, nodes_city, edges_follows, and edges_livesin tables:

uv run icebug-format \
  --source-db demo-db.duckdb \
  --schema demo-db/schema.cypher

Verify the result with test_csr_duckdb.py:

uv run ./icebug-format/test_csr_duckdb.py --input demo-db_csr

Metadata: 7 nodes, 8 edges, directed=True

Node Tables:
Table: demo_nodes_user
(100, 'Adam', 30) ...

Edge Tables (reconstructed from CSR):
Table: follows (FROM user TO user)
(100, 250, 2020) ...

icebug-memory v1

Python API

Convert Arrow tables directly into an in-memory CSR graph

from icebug_format import IcebugMemGraph

# Directed heterogeneous graph (different node types on each end)
graph: IcebugMemGraph = IcebugMemGraph.from_arrow_tables(
    from_node_arrow_table=users,   # pa.Table, first column is the primary key
    rel_arrow_table=livesin,       # pa.Table with 'source' and 'target' columns
    to_node_arrow_table=cities,    # pa.Table, first column is the primary key
)

# Directed graph, or homogeneous graph with reverse edges added
graph: IcebugMemGraph = IcebugMemGraph.from_arrow_tables(
    from_node_arrow_table=users,   # pa.Table, first column is the primary key
    rel_arrow_table=follows,       # pa.Table with 'source' and 'target' columns
    add_reverse_edges=True,        # to_node_arrow_table must be omitted
)

# Node tables are passed through unchanged
graph.src    # pa.Table — source nodes
graph.dest   # pa.Table — destination nodes

# CSR adjacency structure
graph.indices  # pa.Table — 'target' column (+ any edge properties), sorted by source
graph.indptr   # pa.Table — 'ptr' column of length len(src) + 1

The rel_arrow_table source and target columns are resolved by name in priority order, with a positional fallback:

Role	Accepted names (in order)	Fallback
Source	`source`, `src`, `from`	0th column
Target	`target`, `destination`, `dest`, `to`	1st column

Any remaining columns are preserved as edge properties in graph.indices.

Use --add-reverse-edges in the CLI, or add_reverse_edges=True in the Python API, to emit a symmetric adjacency by adding reverse edges. For reverse-edge expansion, to_node_arrow_table must be omitted; the same node table is used for both sides of every edge.

Caveats

icebug-format will always output a directed graph
If an algorithm needs symmetric adjacency, pass --add-reverse-edges to the CLI or add_reverse_edges=True to the Python API. Reverse edges will be added automatically. Reverse-edge expansion is supported only for rel tables with the same node type on both ends.
Reverse-edge expansion is all or nothing for a conversion. If your graph mixes edge types that should be symmetric, such as friends, with edge types that should stay directed, such as follows, run separate conversions or add reverse edges before calling icebug-format; --add-reverse-edges cannot be applied selectively per edge type.

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
.github/workflows		.github/workflows
doc		doc
examples/karate		examples/karate
icebug_format		icebug_format
tests		tests
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
icebug-format.py		icebug-format.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Icebug Format

icebug-disk v1

CLI

Output structure

Example

icebug-memory v1

Python API

Caveats

Further reading

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Icebug Format

icebug-disk v1

CLI

Output structure

Example

icebug-memory v1

Python API

Caveats

Further reading

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages