Thanks to visit codestin.com
Credit goes to github.com

Skip to content

yohanelly95/erc20_indexer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ERC20 Indexing System

A comprehensive multi-chain ERC20 token indexing and analytics system that extracts blockchain data, processes metrics, and provides REST APIs for querying token analytics.

System Architecture

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│  Blockchain     │    │  Google Cloud    │    │  PostgreSQL     │
│  RPC Nodes      │─── │  Storage (GCS)   │─── │  Database       │
└─────────────────┘    └──────────────────┘    └─────────────────┘
         │                        │                        │
         ▼                        ▼                        ▼
┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│ multichain-     │    │    BigQuery      │    │ erc20-metrics-  │
│ orchestrator    │───▶│   Data Warehouse │◄───│     cron        │
└─────────────────┘    └──────────────────┘    └─────────────────┘
         │                        │                        │
         ▼                        │                        ▼
┌─────────────────┐                │               ┌─────────────────┐
│ erc20-processor │                │               │ erc20-query-api │
└─────────────────┘                │               └─────────────────┘
                                   │                        │
                                   ▼                        ▼
                              ┌─────────────────────────────┐
                              │      REST API               │
                              │      Consumers              │
                              └─────────────────────────────┘

Components Overview

1. multichain-orchestrator

Purpose: Extracts raw blockchain data (blocks and ERC20 transfers) using Cryo

  • Configurable chain support (Ethereum, Optimism, Arbitrum, Base)
  • Parallel chunk processing
  • Error handling and retry logic
  • Uploads to Google Cloud Storage

2. erc20-processor

Purpose: Processes raw extracted data and loads into PostgreSQL

  • Transforms raw blockchain data
  • Database schema management
  • Batch processing for performance

3. erc20-metrics-cron

Purpose: Scheduled collection of analytics and metrics from BigQuery

  • Daily, hourly, and holder statistics
  • Automatic synchronization from BigQuery to PostgreSQL
  • Reconciliation and cleanup jobs
  • Support for multiple authentication methods

4. erc20-query-api

Purpose: REST API for querying processed metrics and analytics

  • Token metrics endpoints
  • Historical data queries
  • Whale transfer tracking
  • Performance analytics

Quick Start

Prerequisites

# Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Install Cryo

# Install Google Cloud SDK
curl https://sdk.cloud.google.com | bash

# Install PostgreSQL (version 14+)
# On Ubuntu/Debian:
sudo apt install postgresql postgresql-contrib

# On macOS:
brew install postgresql

Environment Setup

  1. Google Cloud Authentication:
# Authenticate with Google Cloud
gcloud auth login
gcloud auth application-default login

# Set your project
gcloud config set project YOUR_PROJECT_ID

# For service accounts (production):
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account-key.json"
  1. PostgreSQL Setup:
-- Create database and user
CREATE DATABASE erc20_optimism;
CREATE USER erc20_user WITH PASSWORD 'your_password';
GRANT ALL PRIVILEGES ON DATABASE erc20_optimism TO erc20_user;
  1. BigQuery Setup:
# Create dataset
bq mk --dataset YOUR_PROJECT_ID:erc20_multichain_india

# Create tables (run once)
cd multichain-orchestrator
./scripts/create-multichain-bq.sh

Usage Guide

Data Extraction Workflow

1. Extract New Blockchain Data

cd multichain-orchestrator

# Configure extraction range
# Edit configs/optimism-only.toml
[chains.optimism]
chain_id = 10
start_block = 140313367  # Start from where you left off
end_block = 140413367    # 100k new blocks
chunk_size = 100000

# Run extraction
cargo run --release --bin multichain-orchestrator configs/optimism-only.toml

2. Upload to Google Cloud Storage

# Upload extracted parquet files to GCS
./scripts/upload-multichain-gcs.sh optimism

# Verify upload
gsutil ls gs://your-bucket/optimism/erc20_transfers/ | tail -5

3. Load into BigQuery (Incremental)

# Use the new incremental loading script
./scripts/load-multichain-incremental.sh optimism

# For dry-run to preview what will be loaded:
./scripts/load-multichain-incremental.sh --dry-run optimism

# Force full reload (careful!):
./scripts/load-multichain-incremental.sh --force-replace optimism

4. Process Metrics (Automatic)

The erc20-metrics-cron service runs automatically on schedule:

cd erc20-metrics-cron

# Copy and edit configuration
cp config.toml.example config.toml
# Edit database connection, BigQuery project, etc.

# Run the service
cargo run --release --bin erc20-metrics-cron

# For immediate sync (don't wait for schedule):
cargo run --release --bin erc20-metrics-cron -- --reconcile

Script Usage Sequence

For adding new blocks to an existing chain:

# 1. Configure new block range
vim multichain-orchestrator/configs/optimism-only.toml

# 2. Extract data
cd multichain-orchestrator
cargo run --release --bin multichain-orchestrator configs/optimism-only.toml

# 3. Upload to GCS
./scripts/upload-multichain-gcs.sh optimism

# 4. Load incrementally to BigQuery
./scripts/load-multichain-incremental.sh optimism

# 5. Verify metrics sync (automatic via cron)
# Check logs: tail -f /path/to/metrics-cron.log

Component Documentation

multichain-orchestrator

Configuration (configs/optimism-only.toml):

[chains.optimism]
chain_id = 10
start_block = 140313367 
end_block = 140413367
chunk_size = 100000
reorg_buffer = 100
rpcs = [
    { url = "http://your-rpc-url:8545", max_requests_per_second = 1000 }
]

[orchestration]
max_parallel_chains = 1
retry_failed_chunks_interval_minutes = 15
health_check_interval_seconds = 60
checkpoint_interval_minutes = 1
max_retry_attempts = 3

Key Scripts:

  • upload-multichain-gcs.sh - Upload parquet files to GCS
  • load-multichain-incremental.sh - Incremental BigQuery loading
  • load-multichain-data.sh - Full BigQuery reload (use with caution)

Usage Examples:

# Extract 1 million blocks in 100k chunks
cargo run --release --bin multichain-orchestrator configs/optimism-only.toml

# Monitor progress
tail -f extraction_state/log_*.txt

# Check completed chunks
cat extraction_state/completed_blocks.txt

erc20-processor

Purpose: Transform and load raw blockchain data into PostgreSQL

Database Schema:

  • optimism_blocks - Block information
  • optimism_erc20_transfers - ERC20 transfer events
  • Token metadata and analytics tables

Usage:

cd erc20-processor

# Process parquet files into PostgreSQL
cargo run --release

# Run with custom configuration
DATABASE_URL="postgresql://user:pass@localhost/db" cargo run --release

erc20-metrics-cron

Purpose: Collect and sync metrics from BigQuery to PostgreSQL

Configuration (config.toml):

[bigquery]
project_id = "your-project-id"
dataset_id = "erc20_multichain_india"

[postgres]
url = "postgresql://erc20_user:password@localhost/erc20_optimism"

[schedules]
daily_metrics = "0 30 */6 * * *"     # Every 6 hours
holder_stats = "0 15 */12 * * *"     # Every 12 hours  
hourly_patterns = "0 45 */4 * * *"   # Every 4 hours

Running Modes:

  1. Scheduled Service (Production):
# Start the cron service
cargo run --release --bin erc20-metrics-cron

# The service will run these jobs on schedule:
# - Daily metrics: 00:30, 06:30, 12:30, 18:30
# - Holder stats: 00:15, 12:15
# - Hourly patterns: 00:45, 04:45, 08:45, 12:45, 16:45, 20:45
  1. Manual Reconciliation:
# Run cleanup only (doesn't sync new data)
cargo run --release --bin erc20-metrics-cron -- --reconcile
  1. Force Initial Sync:
# Clear sync status to force re-sync
psql -d erc20_optimism -c "DELETE FROM sync_status WHERE metric_type = 'daily_metrics';"

# Restart service - it will run initial collection
cargo run --release --bin erc20-metrics-cron

Authentication Setup:

# For service account (recommended for production)
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account-key.json"

# For user account (development)
gcloud auth application-default login
export GOOGLE_APPLICATION_CREDENTIALS="$HOME/.config/gcloud/application_default_credentials.json"

Monitored Tables:

  • token_daily_metrics - Daily aggregated statistics
  • whale_transfers - Large value transfers (>$10k)
  • token_rankings - Daily token rankings by volume/activity
  • sync_status - Tracks synchronization state

erc20-query-api

Purpose: REST API for querying processed ERC20 metrics

Starting the API:

cd erc20-query-api

# Start on default port 8080
cargo run --release

# Custom port and database
DATABASE_URL="postgresql://user:pass@localhost/db" PORT=3000 cargo run --release

API Endpoints:

1. Token Daily Metrics

# Get daily metrics for a specific token
curl "http://localhost:8080/api/v1/optimism/tokens/0x4200000000000000000000000000000000000006/daily-metrics?days=7"

# Response format:
{
  "data": [
    {
      "date": "2025-08-29",
      "daily_active_addresses": 1234,
      "daily_transactions": 5678,
      "daily_transfers": 9012,
      "daily_volume": "123.456789",
      "daily_minted": "10.0",
      "daily_burned": "5.0"
    }
  ],
  "count": 7
}

2. Whale Transfers

# Get large transfers for a token
curl "http://localhost:8080/api/v1/optimism/tokens/0x4200000000000000000000000000000000000006/whale-transfers?limit=10&min_usd=50000"

# Response format:
{
  "data": [
    {
      "transfer_date": "2025-08-29",
      "transaction_hash": "0xabc123...",
      "from_address": "0xdef456...",
      "to_address": "0x789abc...",
      "amount": "100.0",
      "amount_usd": "150000.50"
    }
  ],
  "count": 10
}

3. Token Rankings

# Get daily token rankings
curl "http://localhost:8080/api/v1/optimism/rankings/daily?limit=20"

# Response format:
{
  "data": [
    {
      "ranking_date": "2025-08-29",
      "token_address": "0x4200000000000000000000000000000000000006",
      "rank_by_volume": 1,
      "rank_by_transactions": 1,
      "volume_24h": "1000000.50",
      "transactions_24h": 15000
    }
  ],
  "count": 20
}

4. Hourly Patterns

# Get hourly activity patterns  
curl "http://localhost:8080/api/v1/optimism/tokens/0x4200000000000000000000000000000000000006/hourly-patterns?hours=24"

# Response format:
{
  "data": [
    {
      "pattern_hour": "2025-08-29T10:00:00Z",
      "hourly_transactions": 156,
      "hourly_volume": "12345.67",
      "hourly_active_addresses": 89
    }
  ],
  "count": 24
}

Query Parameters:

  • days - Number of days of historical data (default: 30, max: 365)
  • hours - Number of hours of historical data (default: 24, max: 168)
  • limit - Maximum number of results (default: 100, max: 1000)
  • offset - Results offset for pagination (default: 0)
  • min_usd - Minimum USD value for whale transfers (default: 10000)

Health Check:

curl http://localhost:8080/health
# Response: {"status": "healthy"}

Operations Guide

Daily Operations

  1. Monitor Data Freshness:
# Check latest block in BigQuery
bq query --use_legacy_sql=false "SELECT MAX(block_number) FROM \`project.dataset.erc20_transfers_optimism\`"

# Check sync status
psql -d erc20_optimism -c "SELECT * FROM sync_status ORDER BY updated_at DESC LIMIT 5;"
  1. Monitor Service Health:
# Check if metrics cron is running
ps aux | grep erc20-metrics-cron

# Check API health
curl http://localhost:8080/health

# View recent logs
tail -f /path/to/logs/erc20-metrics-cron.log

Adding New Tokens

  1. Update Configuration:
# Add to erc20-metrics-cron/config.toml
[[tokens.optimism]]
address = "0xNewTokenAddress"
symbol = "NEW"
name = "New Token"
decimals = 18
priority = 5
  1. Clear Sync Status (to force historical sync):
-- This will make the cron collect data from day 1 for the new token
DELETE FROM sync_status 
WHERE token_address = '0xNewTokenAddress';
  1. Restart Service:
# Restart metrics cron to pick up new token
pkill erc20-metrics-cron
cargo run --release --bin erc20-metrics-cron

Adding New Blocks

The recommended workflow for adding new blocks to an existing chain:

  1. Update Configuration:
# Edit configs/optimism-only.toml
# Set start_block to where you left off
# Set end_block to the new target
  1. Extract and Upload:
# Extract new data
cargo run --release --bin multichain-orchestrator configs/optimism-only.toml

# Upload to GCS  
./scripts/upload-multichain-gcs.sh optimism
  1. Load Incrementally:
# Only loads new blocks, won't duplicate existing data
./scripts/load-multichain-incremental.sh optimism
  1. Verify:
# Check block range in BigQuery
bq query --use_legacy_sql=false "SELECT MIN(block_number), MAX(block_number), COUNT(*) FROM \`project.dataset.erc20_transfers_optimism\`"

Troubleshooting

Authentication Issues:

# Check current auth
gcloud auth list

# Refresh application default credentials
gcloud auth application-default login

# For service account issues
gcloud iam service-accounts keys list --iam-account=your-service-account@project.iam.gserviceaccount.com

BigQuery Query Failures:

# Check permissions
bq ls --project_id=your-project-id

# Test simple query
bq query --use_legacy_sql=false "SELECT 1"

Database Connection Issues:

# Test PostgreSQL connection
psql -h localhost -U erc20_user -d erc20_optimism -c "SELECT 1;"

# Check running processes
sudo netstat -tlnp | grep :5432

Incremental Loading Issues:

# Debug what would be loaded
./scripts/load-multichain-incremental.sh --dry-run optimism

# Check for overlapping files
./scripts/load-multichain-incremental.sh --verbose optimism

Performance and Scaling

Optimization Tips

  1. Parallel Processing:
# Increase parallel chains for multi-chain setups
max_parallel_chains = 4

# Optimize chunk size based on chain
chunk_size = 100000  # Good for Optimism
chunk_size = 1000    # Better for Ethereum mainnet
  1. RPC Optimization:
# Use multiple RPC endpoints
rpcs = [
    { url = "http://rpc1:8545", max_requests_per_second = 1000 },
    { url = "http://rpc2:8545", max_requests_per_second = 1000 }
]
  1. BigQuery Optimization:
# Use clustering for better query performance
bq mk --table --clustering_fields=block_number,erc20 project:dataset.table

Development

Building from Source

# Clone the repository
git clone <repository-url>
cd erc20-indexing

# Build all components
cargo build --release

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published