ERC20 Indexing System

A comprehensive multi-chain ERC20 token indexing and analytics system that extracts blockchain data, processes metrics, and provides REST APIs for querying token analytics.

System Architecture

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│  Blockchain     │    │  Google Cloud    │    │  PostgreSQL     │
│  RPC Nodes      │─── │  Storage (GCS)   │─── │  Database       │
└─────────────────┘    └──────────────────┘    └─────────────────┘
         │                        │                        │
         ▼                        ▼                        ▼
┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│ multichain-     │    │    BigQuery      │    │ erc20-metrics-  │
│ orchestrator    │───▶│   Data Warehouse │◄───│     cron        │
└─────────────────┘    └──────────────────┘    └─────────────────┘
         │                        │                        │
         ▼                        │                        ▼
┌─────────────────┐                │               ┌─────────────────┐
│ erc20-processor │                │               │ erc20-query-api │
└─────────────────┘                │               └─────────────────┘
                                   │                        │
                                   ▼                        ▼
                              ┌─────────────────────────────┐
                              │      REST API               │
                              │      Consumers              │
                              └─────────────────────────────┘

Components Overview

1. multichain-orchestrator

Purpose: Extracts raw blockchain data (blocks and ERC20 transfers) using Cryo

Configurable chain support (Ethereum, Optimism, Arbitrum, Base)
Parallel chunk processing
Error handling and retry logic
Uploads to Google Cloud Storage

2. erc20-processor

Purpose: Processes raw extracted data and loads into PostgreSQL

Transforms raw blockchain data
Database schema management
Batch processing for performance

3. erc20-metrics-cron

Purpose: Scheduled collection of analytics and metrics from BigQuery

Daily, hourly, and holder statistics
Automatic synchronization from BigQuery to PostgreSQL
Reconciliation and cleanup jobs
Support for multiple authentication methods

4. erc20-query-api

Purpose: REST API for querying processed metrics and analytics

Token metrics endpoints
Historical data queries
Whale transfer tracking
Performance analytics

Quick Start

Prerequisites

# Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Install Cryo

# Install Google Cloud SDK
curl https://sdk.cloud.google.com | bash

# Install PostgreSQL (version 14+)
# On Ubuntu/Debian:
sudo apt install postgresql postgresql-contrib

# On macOS:
brew install postgresql

Environment Setup

Google Cloud Authentication:

# Authenticate with Google Cloud
gcloud auth login
gcloud auth application-default login

# Set your project
gcloud config set project YOUR_PROJECT_ID

# For service accounts (production):
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account-key.json"

PostgreSQL Setup:

-- Create database and user
CREATE DATABASE erc20_optimism;
CREATE USER erc20_user WITH PASSWORD 'your_password';
GRANT ALL PRIVILEGES ON DATABASE erc20_optimism TO erc20_user;

BigQuery Setup:

# Create dataset
bq mk --dataset YOUR_PROJECT_ID:erc20_multichain_india

# Create tables (run once)
cd multichain-orchestrator
./scripts/create-multichain-bq.sh

Usage Guide

Data Extraction Workflow

1. Extract New Blockchain Data

cd multichain-orchestrator

# Configure extraction range
# Edit configs/optimism-only.toml
[chains.optimism]
chain_id = 10
start_block = 140313367  # Start from where you left off
end_block = 140413367    # 100k new blocks
chunk_size = 100000

# Run extraction
cargo run --release --bin multichain-orchestrator configs/optimism-only.toml

2. Upload to Google Cloud Storage

# Upload extracted parquet files to GCS
./scripts/upload-multichain-gcs.sh optimism

# Verify upload
gsutil ls gs://your-bucket/optimism/erc20_transfers/ | tail -5

3. Load into BigQuery (Incremental)

# Use the new incremental loading script
./scripts/load-multichain-incremental.sh optimism

# For dry-run to preview what will be loaded:
./scripts/load-multichain-incremental.sh --dry-run optimism

# Force full reload (careful!):
./scripts/load-multichain-incremental.sh --force-replace optimism

4. Process Metrics (Automatic)

The erc20-metrics-cron service runs automatically on schedule:

cd erc20-metrics-cron

# Copy and edit configuration
cp config.toml.example config.toml
# Edit database connection, BigQuery project, etc.

# Run the service
cargo run --release --bin erc20-metrics-cron

# For immediate sync (don't wait for schedule):
cargo run --release --bin erc20-metrics-cron -- --reconcile

Script Usage Sequence

For adding new blocks to an existing chain:

# 1. Configure new block range
vim multichain-orchestrator/configs/optimism-only.toml

# 2. Extract data
cd multichain-orchestrator
cargo run --release --bin multichain-orchestrator configs/optimism-only.toml

# 3. Upload to GCS
./scripts/upload-multichain-gcs.sh optimism

# 4. Load incrementally to BigQuery
./scripts/load-multichain-incremental.sh optimism

# 5. Verify metrics sync (automatic via cron)
# Check logs: tail -f /path/to/metrics-cron.log

Component Documentation

multichain-orchestrator

Configuration (configs/optimism-only.toml):

[chains.optimism]
chain_id = 10
start_block = 140313367 
end_block = 140413367
chunk_size = 100000
reorg_buffer = 100
rpcs = [
    { url = "http://your-rpc-url:8545", max_requests_per_second = 1000 }
]

[orchestration]
max_parallel_chains = 1
retry_failed_chunks_interval_minutes = 15
health_check_interval_seconds = 60
checkpoint_interval_minutes = 1
max_retry_attempts = 3

Key Scripts:

upload-multichain-gcs.sh - Upload parquet files to GCS
load-multichain-incremental.sh - Incremental BigQuery loading
load-multichain-data.sh - Full BigQuery reload (use with caution)

Usage Examples:

# Extract 1 million blocks in 100k chunks
cargo run --release --bin multichain-orchestrator configs/optimism-only.toml

# Monitor progress
tail -f extraction_state/log_*.txt

# Check completed chunks
cat extraction_state/completed_blocks.txt

erc20-processor

Purpose: Transform and load raw blockchain data into PostgreSQL

Database Schema:

optimism_blocks - Block information
optimism_erc20_transfers - ERC20 transfer events
Token metadata and analytics tables

Usage:

cd erc20-processor

# Process parquet files into PostgreSQL
cargo run --release

# Run with custom configuration
DATABASE_URL="postgresql://user:pass@localhost/db" cargo run --release

erc20-metrics-cron

Purpose: Collect and sync metrics from BigQuery to PostgreSQL

Configuration (config.toml):

[bigquery]
project_id = "your-project-id"
dataset_id = "erc20_multichain_india"

[postgres]
url = "postgresql://erc20_user:password@localhost/erc20_optimism"

[schedules]
daily_metrics = "0 30 */6 * * *"     # Every 6 hours
holder_stats = "0 15 */12 * * *"     # Every 12 hours  
hourly_patterns = "0 45 */4 * * *"   # Every 4 hours

Running Modes:

Scheduled Service (Production):

# Start the cron service
cargo run --release --bin erc20-metrics-cron

# The service will run these jobs on schedule:
# - Daily metrics: 00:30, 06:30, 12:30, 18:30
# - Holder stats: 00:15, 12:15
# - Hourly patterns: 00:45, 04:45, 08:45, 12:45, 16:45, 20:45

Manual Reconciliation:

# Run cleanup only (doesn't sync new data)
cargo run --release --bin erc20-metrics-cron -- --reconcile

Force Initial Sync:

# Clear sync status to force re-sync
psql -d erc20_optimism -c "DELETE FROM sync_status WHERE metric_type = 'daily_metrics';"

# Restart service - it will run initial collection
cargo run --release --bin erc20-metrics-cron

Authentication Setup:

# For service account (recommended for production)
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account-key.json"

# For user account (development)
gcloud auth application-default login
export GOOGLE_APPLICATION_CREDENTIALS="$HOME/.config/gcloud/application_default_credentials.json"

Monitored Tables:

token_daily_metrics - Daily aggregated statistics
whale_transfers - Large value transfers (>$10k)
token_rankings - Daily token rankings by volume/activity
sync_status - Tracks synchronization state

erc20-query-api

Purpose: REST API for querying processed ERC20 metrics

Starting the API:

cd erc20-query-api

# Start on default port 8080
cargo run --release

# Custom port and database
DATABASE_URL="postgresql://user:pass@localhost/db" PORT=3000 cargo run --release

API Endpoints:

1. Token Daily Metrics

# Get daily metrics for a specific token
curl "http://localhost:8080/api/v1/optimism/tokens/0x4200000000000000000000000000000000000006/daily-metrics?days=7"

# Response format:
{
  "data": [
    {
      "date": "2025-08-29",
      "daily_active_addresses": 1234,
      "daily_transactions": 5678,
      "daily_transfers": 9012,
      "daily_volume": "123.456789",
      "daily_minted": "10.0",
      "daily_burned": "5.0"
    }
  ],
  "count": 7
}

2. Whale Transfers

# Get large transfers for a token
curl "http://localhost:8080/api/v1/optimism/tokens/0x4200000000000000000000000000000000000006/whale-transfers?limit=10&min_usd=50000"

# Response format:
{
  "data": [
    {
      "transfer_date": "2025-08-29",
      "transaction_hash": "0xabc123...",
      "from_address": "0xdef456...",
      "to_address": "0x789abc...",
      "amount": "100.0",
      "amount_usd": "150000.50"
    }
  ],
  "count": 10
}

3. Token Rankings

# Get daily token rankings
curl "http://localhost:8080/api/v1/optimism/rankings/daily?limit=20"

# Response format:
{
  "data": [
    {
      "ranking_date": "2025-08-29",
      "token_address": "0x4200000000000000000000000000000000000006",
      "rank_by_volume": 1,
      "rank_by_transactions": 1,
      "volume_24h": "1000000.50",
      "transactions_24h": 15000
    }
  ],
  "count": 20
}

4. Hourly Patterns

# Get hourly activity patterns  
curl "http://localhost:8080/api/v1/optimism/tokens/0x4200000000000000000000000000000000000006/hourly-patterns?hours=24"

# Response format:
{
  "data": [
    {
      "pattern_hour": "2025-08-29T10:00:00Z",
      "hourly_transactions": 156,
      "hourly_volume": "12345.67",
      "hourly_active_addresses": 89
    }
  ],
  "count": 24
}

Query Parameters:

days - Number of days of historical data (default: 30, max: 365)
hours - Number of hours of historical data (default: 24, max: 168)
limit - Maximum number of results (default: 100, max: 1000)
offset - Results offset for pagination (default: 0)
min_usd - Minimum USD value for whale transfers (default: 10000)

Health Check:

curl http://localhost:8080/health
# Response: {"status": "healthy"}

Operations Guide

Daily Operations

Monitor Data Freshness:

# Check latest block in BigQuery
bq query --use_legacy_sql=false "SELECT MAX(block_number) FROM \`project.dataset.erc20_transfers_optimism\`"

# Check sync status
psql -d erc20_optimism -c "SELECT * FROM sync_status ORDER BY updated_at DESC LIMIT 5;"

Monitor Service Health:

# Check if metrics cron is running
ps aux | grep erc20-metrics-cron

# Check API health
curl http://localhost:8080/health

# View recent logs
tail -f /path/to/logs/erc20-metrics-cron.log

Adding New Tokens

Update Configuration:

# Add to erc20-metrics-cron/config.toml
[[tokens.optimism]]
address = "0xNewTokenAddress"
symbol = "NEW"
name = "New Token"
decimals = 18
priority = 5

Clear Sync Status (to force historical sync):

-- This will make the cron collect data from day 1 for the new token
DELETE FROM sync_status 
WHERE token_address = '0xNewTokenAddress';

Restart Service:

# Restart metrics cron to pick up new token
pkill erc20-metrics-cron
cargo run --release --bin erc20-metrics-cron

Adding New Blocks

The recommended workflow for adding new blocks to an existing chain:

Update Configuration:

# Edit configs/optimism-only.toml
# Set start_block to where you left off
# Set end_block to the new target

Extract and Upload:

# Extract new data
cargo run --release --bin multichain-orchestrator configs/optimism-only.toml

# Upload to GCS  
./scripts/upload-multichain-gcs.sh optimism

Load Incrementally:

# Only loads new blocks, won't duplicate existing data
./scripts/load-multichain-incremental.sh optimism

Verify:

# Check block range in BigQuery
bq query --use_legacy_sql=false "SELECT MIN(block_number), MAX(block_number), COUNT(*) FROM \`project.dataset.erc20_transfers_optimism\`"

Troubleshooting

Authentication Issues:

# Check current auth
gcloud auth list

# Refresh application default credentials
gcloud auth application-default login

# For service account issues
gcloud iam service-accounts keys list --iam-account=your-service-account@project.iam.gserviceaccount.com

BigQuery Query Failures:

# Check permissions
bq ls --project_id=your-project-id

# Test simple query
bq query --use_legacy_sql=false "SELECT 1"

Database Connection Issues:

# Test PostgreSQL connection
psql -h localhost -U erc20_user -d erc20_optimism -c "SELECT 1;"

# Check running processes
sudo netstat -tlnp | grep :5432

Incremental Loading Issues:

# Debug what would be loaded
./scripts/load-multichain-incremental.sh --dry-run optimism

# Check for overlapping files
./scripts/load-multichain-incremental.sh --verbose optimism

Performance and Scaling

Optimization Tips

Parallel Processing:

# Increase parallel chains for multi-chain setups
max_parallel_chains = 4

# Optimize chunk size based on chain
chunk_size = 100000  # Good for Optimism
chunk_size = 1000    # Better for Ethereum mainnet

RPC Optimization:

# Use multiple RPC endpoints
rpcs = [
    { url = "http://rpc1:8545", max_requests_per_second = 1000 },
    { url = "http://rpc2:8545", max_requests_per_second = 1000 }
]

BigQuery Optimization:

# Use clustering for better query performance
bq mk --table --clustering_fields=block_number,erc20 project:dataset.table

Development

Building from Source

# Clone the repository
git clone <repository-url>
cd erc20-indexing

# Build all components
cargo build --release

Name		Name	Last commit message	Last commit date
Latest commit History 158 Commits
bigquery-queries		bigquery-queries
bigquery-schemas		bigquery-schemas
bigquery-scripts		bigquery-scripts
cryo-scripts		cryo-scripts
erc20-live-indexer		erc20-live-indexer
erc20-metrics-cron		erc20-metrics-cron
erc20-processor		erc20-processor
erc20-query-api		erc20-query-api
erc20-validator		erc20-validator
gcs-scripts		gcs-scripts
multichain-orchestrator		multichain-orchestrator
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
DUAL_PATH_SETUP.md		DUAL_PATH_SETUP.md
IMPLEMENTATION_SUMMARY.md		IMPLEMENTATION_SUMMARY.md
README.md		README.md
README_SETUP.md		README_SETUP.md
create_historical_metrics.sql		create_historical_metrics.sql
docker-compose.yml		docker-compose.yml

yohanelly95/erc20_indexer

Folders and files

Latest commit

History

Repository files navigation