GitHub Contributor Analytics

A modern analytics pipeline for tracking and analyzing GitHub contributions across repositories. The system processes contributor data, generates AI-powered summaries, and maintains a leaderboard of developer activity.

Prerequisites

Bun (recommended) or Node.js 18+
GitHub Personal Access Token with repo scope
OpenRouter API Key (optional, for AI summaries)
uv (optional, for syncing from production DB)

Features

Tracks pull requests, issues, reviews, and comments
Calculates contributor scores based on activity and impact
Generates AI-powered summaries of contributions
Exports daily summaries to JSON files
Maintains contributor expertise levels and focus areas
Interactive contributor profile pages
Activity visualizations and metrics
Daily, weekly, and monthly reports
Smart contributor scoring system

Wallet Linking (Optional Feature)

Contributors can optionally link their Ethereum and Solana wallet addresses to their GitHub profiles. When configured, users can authenticate via GitHub OAuth and store wallet addresses in their profile README.

Setup: See auth-worker/README.md for Cloudflare Worker deployment and OAuth configuration.

Required secrets (if enabling):

NEXT_PUBLIC_GITHUB_CLIENT_ID - GitHub OAuth App Client ID
NEXT_PUBLIC_AUTH_WORKER_URL - Deployed Cloudflare Worker URL

Note: The leaderboard works perfectly fine without this feature. It's purely additive.

Setup

Install dependencies:

bun install

Set up environment variables in .env using .env.example for reference:

# Required for Github Ingest
GITHUB_TOKEN=your_github_personal_access_token_here
# Required for AI summaries
OPENROUTER_API_KEY=your_api_key_here
# configure local environment to use cheaper models
LARGE_MODEL=openai/gpt-4o-mini

# Optional site info (auto-detected in CI if not set)
SITE_URL=https://your-deployment-url.com
SITE_NAME="Contributor Leaderboard"

# Optional: For wallet linking feature
NEXT_PUBLIC_GITHUB_CLIENT_ID=
NEXT_PUBLIC_AUTH_WORKER_URL=

Then load the environment variables:

source .envrc
# Or if using direnv: direnv allow

Configure repositories to track in config/pipeline.config.ts. See the config file for the full schema and options.
Initialize Database

You can either initialize an empty database or sync the latest data from production:

Option A - Initialize Empty Database:

# Apply migrations
bun run db:migrate

Option B - Sync Production Data:

If you want to download all historical data from the production data branch instead of having to reingest / generate it on your own, you can use the data:sync command, which depends on uv.

# Install uv first if you don't have it (required for database restoration)

pipx install uv  # Recommended method
# OR
brew install uv  # macOS with Homebrew

# More installation options: https://docs.astral.sh/uv/getting-started/installation/

# Download the latest data from production
bun run data:sync
# Or, if you are on a fork:
bun run data:sync --remote upstream
# This will:
# - Fetch the latest data from the _data branch
# - Copy all data files (stats, summaries, etc.)
# - Restore the SQLite database from the diffable dump

# If you made local changes to the schema that don't exist in prod DB:
bun run db:generate
bun run db:migrate

The data sync utility supports several options:

# View all options
bun run data:sync --help

# Skip confirmation prompts (useful in scripts)
bun run data:sync -y

# Sync from a different remote (if you've added one)
bun run data:sync --remote upstream

# Skip database restoration (only sync generated JSON/MD files)
bun run data:sync --skip-db

# Delete all local data and force sync
bun run data:sync --force

After syncing or initializing the database, you can explore it using Drizzle Studio:

# Launch the database explorer
bun run db:studio

If you encounter any issues with Drizzle Studio due to Node.js version mismatches, you can use a different SQLite browser tool like SQLite Browser.

Quick Start

# Ingest recent data
bun run pipeline ingest --days 90

# Process and calculate scores
bun run pipeline process

# Build the site
bun run build

Commands and Capabilities

You can see the main pipelines and their usages with these commands below:

bun run pipeline ingest -h
bun run pipeline process -h
bun run pipeline export -h
bun run pipeline summarize -h

Data Ingestion

# Ingest latest Github data (default since last fetched, or 7 days)
bun run pipeline ingest

# Ingest from beginning
bun run pipeline ingest --after 2024-10-15

# Ingest with specific date range
bun run pipeline ingest --after 2025-01-01 --before 2025-02-20

# Ingest data for a specific number of days
bun run pipeline ingest --days 30 --before 2024-03-31

# Ingest with verbose logging
bun run pipeline ingest -v

# Ingest with custom config file
bun run pipeline ingest --config custom-config.ts

Data Processing and Analysis

# Process and analyze all repositories
bun run pipeline process

# Force recalculation of scores even if they already exist
bun run pipeline process --force

# Process specific repository
bun run pipeline process --repository owner/repo

# Process with verbose logging
bun run pipeline process -v

# Process with custom config
bun run pipeline process --config custom-config.ts

Generating Stats and Exports

# Export repository stats (defaults to 30 days)
bun run pipeline export

# Export with specific date range
bun run pipeline export --after 2025-01-01 --before 2025-02-20

# Export for a specific number of days
bun run pipeline export --days 60

# Export all data since contributionStartDate
bun run pipeline export --all

# Export for specific repository
bun run pipeline export -r owner/repo

# Export to custom directory
bun run pipeline export --output-dir ./custom-dir/

# Export with verbose logging
bun run pipeline export -v

# Regenerate and overwrite existing files
bun run pipeline export --force

AI Summary Generation

Generated project summaries are stored in data/<owner_repo>/<interval>/summaries/summary_<date>.json.

# Generate repository-level summaries
bun run pipeline summarize -t repository

# Generate overall summaries (after repository summaries are generated)
bun run pipeline summarize -t overall

# Generate contributor summaries
bun run pipeline summarize -t contributors

# Generate summaries with specific date range
bun run pipeline summarize -t repository --after 2025-01-01 --before 2025-02-20

# Force overwrite existing summaries
bun run pipeline summarize -t repository --force

# Generate and overwrite summaries for a specific number of days (default 7 days)
bun run pipeline summarize -t repository --days 90 --force

# Generate repository summaries for all data since contributionStartDate
bun run pipeline summarize -t repository --all

# Generate summaries for a specific repository
bun run pipeline summarize -t repository --repository owner/repo

# Generate only daily and weekly contributor summaries
bun run pipeline summarize -t contributors --daily --weekly

# Generate summaries with verbose logging
bun run pipeline summarize -t repository -v

By default, the summarize command wont regenerate summaries that already exist for a given day. To regenerate summaries, you can pass in the -f/--force flag.

Static JSON API

The pipeline generates static JSON API endpoints that can be consumed by external tools, dashboards, or AI agents. These files are generated during pipeline execution and served as static files.

Leaderboard API

# Generate leaderboard API endpoints
bun run pipeline export-leaderboard

# With custom limit (default 100, 0 = unlimited)
bun run pipeline export-leaderboard --limit 50

# Output to custom directory
bun run pipeline export-leaderboard --output-dir ./custom-dir/

Endpoints:

Endpoint	Description
`/api/leaderboard-monthly.json`	Current month's leaderboard
`/api/leaderboard-weekly.json`	Current week's leaderboard
`/api/leaderboard-lifetime.json`	All-time leaderboard

API Base URL: https://{your-domain}/api/

For GitHub Pages deployments, your base URL follows this pattern:

Org/user site (username.github.io repo): https://{username}.github.io/api/
Project site (any other repo): https://{username}.github.io/{repo-name}/api/

Response structure:

{
  "version": "1.0",
  "period": "monthly",
  "startDate": "2025-01-01",
  "endDate": "2025-01-31",
  "generatedAt": "2025-01-15T12:00:00Z",
  "totalUsers": 150,
  "leaderboard": [
    {
      "rank": 1,
      "username": "contributor1",
      "avatarUrl": "https://...",
      "score": 1250,
      "prScore": 800,
      "issueScore": 200,
      "reviewScore": 150,
      "commentScore": 100,
      "wallets": { "solana": "...", "ethereum": "..." }
    }
  ]
}

Summary API

Summaries are generated alongside markdown files during the summarize command. JSON API artifacts include metadata for caching and validation.

Endpoints:

Endpoint Pattern	Description
`/api/summaries/overall/{interval}/{date}.json`	Overall summary for a specific date
`/api/summaries/overall/{interval}/latest.json`	Most recent overall summary
`/api/summaries/overall/{interval}/index.json`	Index of all overall summaries
`/api/summaries/repos/{owner}_{repo}/{interval}/{date}.json`	Repository summary
`/api/summaries/repos/{owner}_{repo}/{interval}/latest.json`	Most recent repo summary
`/api/summaries/repos/{owner}_{repo}/{interval}/index.json`	Index of all repo summaries
`/api/summaries/contributors/{username}/{interval}/{date}.json`	Contributor summary
`/api/summaries/contributors/{username}/{interval}/latest.json`	Most recent contributor summary
`/api/summaries/contributors/{username}/{interval}/index.json`	Index of contributor summaries

Where {interval} is one of: day, week, month

Response structure:

{
  "version": "1.0",
  "type": "overall",
  "interval": "day",
  "date": "2025-01-15",
  "generatedAt": "2025-01-15T23:00:00Z",
  "sourceLastUpdated": "2025-01-15T23:00:00Z",
  "contentFormat": "markdown",
  "contentHash": "sha256...",
  "entity": { "repoId": "owner/repo" },
  "content": "# Summary\n\n..."
}

Index structure:

{
  "version": "1.0",
  "type": "overall",
  "interval": "day",
  "generatedAt": "2025-01-15T23:00:00Z",
  "items": [
    {
      "date": "2025-01-15",
      "sourceLastUpdated": "...",
      "contentHash": "...",
      "path": "2025-01-15.json"
    }
  ]
}

Backfilling JSON API

If you have existing summaries in the database that need JSON export (e.g., from before this feature was added):

# Export all summaries to JSON
bun run pipeline export-summaries

# Export specific type
bun run pipeline export-summaries -t overall
bun run pipeline export-summaries -t repository
bun run pipeline export-summaries -t contributor

# Export specific interval
bun run pipeline export-summaries --interval day
bun run pipeline export-summaries --interval week

# Dry run to see what would be exported
bun run pipeline export-summaries --dry-run

Database Management

# Generate database migration files
bun run db:generate

# Apply database migrations
bun run db:migrate

# Launch interactive database explorer
bun run db:studio

Website Generation

# Build and generate contributor profile pages
bun run build

# View the site
bunx serve@latest out

Automation Options

Local Automation

For simple local or server deployments without GitHub Actions:

# Continuous daily automation (runs every 24 hours)
./scripts/daily-automation.sh

This script runs the complete pipeline sequence (ingest → process → export → summarize) continuously. Perfect for development environments or simple server setups. See scripts/README.md for more automation utilities.

CI/CD and Data Management

The project uses GitHub Actions for automated data processing, summary generation, and deployment. The system maintains separate branches for code and data to optimize Git history management.

GitHub Actions Workflows

Run Pipelines (run-pipelines.yml): Runs daily at 23:00 UTC to fetch GitHub data, process it, and generate summaries
- Runs the full ingest → process → export → summarize pipeline chain
- Maintains data in a dedicated _data branch
- Can be manually triggered from Github Actions tab with custom date ranges or forced regeneration
- Runs repository and overall summaries daily, but only runs contributor summaries on Sundays
Deploy to GitHub Pages (deploy.yml): Builds and deploys the site
- Triggered on push to main, manually, or after successful pipeline run
- Restores data from the _data branch before building
- Generates directory listings for the data folder
- Deploys to GitHub Pages
PR Checks (pr-checks.yml): Quality checks for pull requests
- Runs linting, typechecking, and build verification
- Tests the pipeline on a small sample of data
- Verifies migrations are up to date when schema changes

Data Management Architecture

The project uses a specialized data branch strategy to optimize both code and data storage:

Separate Data Branch: All pipeline data is stored in a separate branch (default: _data)
- Keeps the main branch clean and focused on code
- Prevents data changes from cluttering code commits
- Enables efficient data restoration in CI/CD and deployment
Database Serialization: Uses the sqlite-diffable utility to store database content as version-controlled files
- Converts SQLite database to diffable text files in data/dump/
- Enables Git to track database changes efficiently
- Provides an audit trail
- Allows for database "time travel" via git history
Custom GitHub Actions: Two custom actions are used in the workflows:
- restore-db: Restores data from the data branch using sparse checkout
- pipeline-data: Manages worktrees to retrieve and update data in the _data branch

This architecture ensures:

Efficient Git history management (code changes separate from data changes)
Reliable CI/CD workflows with consistent data access
Simplified deployment with automatic data restoration
Effective collaboration without data conflict issues

Deploying Your Own Instance

GitHub Pages Configuration

This project is configured to deploy to GitHub Pages with automatic base path detection. The deploy workflow automatically determines whether your repo is:

An organization/user site (username.github.io) → deploys to root path
A project site (any other repo name) → deploys to /${repo-name}

If you fork this repository:

Enable GitHub Pages:
- Go to repository Settings → Pages
- Source: "GitHub Actions"
- Save
Add Required Secrets (Settings → Secrets and variables → Actions):
- OPENROUTER_API_KEY - Required for AI summary generation
- NEXT_PUBLIC_GITHUB_CLIENT_ID - Optional, for wallet linking
- NEXT_PUBLIC_AUTH_WORKER_URL - Optional, for wallet linking
Enable Workflows:
- Go to Actions tab
- Enable workflows if prompted
- Manually trigger "Run Pipelines" to generate initial data
Access Your Site:
- After successful deployment: https://your-username.github.io/your-repo-name/

Deployment Architecture

The site automatically deploys via GitHub Actions:

Data Generation: run-pipelines.yml runs daily at 23:00 UTC
- Stores data in _data branch (SQLite dumps, stats, summaries)
- Never commits large binary files to main branch
Site Build: deploy.yml triggers on push to main or after pipeline runs
- Auto-detects BASE_PATH and SITE_URL from repository name
- Restores data from _data branch
- Builds Next.js static site
- Deploys to GitHub Pages

Note: To override auto-detection, set BASE_PATH and SITE_URL secrets in your repository settings.

Development

Taskmaster for AI-Assisted Development

The project is set up to work with Taskmaster, an AI-powered task management tool. You can use it directly via the task-master command-line interface (CLI) or through its MCP server for integration with development environments like Cursor.

MCP Setup (for IDE Integration)

To use Taskmaster's AI capabilities within an integrated development environment, you'll need to configure the MCP server. Add the following to your IDE's MCP settings file (e.g., .cursor/mcp.json in your project or a global user setting):

{
  "mcpServers": {
    "task-master-ai": {
      "command": "npx",
      "args": ["-y", "--package=task-master-ai", "task-master-ai"],
      "env": {
        "PERPLEXITY_API_KEY": "YOUR-KEY-HERE",
        "OPENROUTER_API_KEY": "YOUR-KEY-HERE"
      }
    }
  }
}

You can add other API keys for providers like Anthropic (ANTHROPIC_API_KEY) or Google (GOOGLE_API_KEY) to the env object. To use different models, you can configure them via the task-master models command after setup.

For more detailed guides, refer to the official Taskmaster documentation:

TypeScript Pipeline

The project uses a TypeScript-based pipeline for data processing. See Pipeline Documentation for detailed information about:

Basic usage and commands
Pipeline architecture and components
Configuration options
Creating custom pipelines
Available customization points

Updating schema

If you need to modify the database schema (in src/lib/data/schema.ts), follow these steps:

Make your changes to the schema file
Generate migration files:

bun run db:generate

This will create new migration files in the drizzle directory.

Apply migrations

bun run db:migrate

This updates your local database with the new schema changes

Working with Migrations

During development, you might create several migration files as you iterate on your schema. Before submitting a pull request, it's best practice to squash these into a single, clean migration.

Squashing Migrations for a Pull Request

Identify Your New Migrations: Take note of the new migration files you've added in your branch. These are the files you will consolidate.
Delete Your New Migration Files: Remove the migration files (drizzle/*.sql) and the corresponding snnapshots (drizzle/meta/####_snapshot.json) and entries in drizzle/meta/_journal.json that were created from running db:generate.
Generate a Single, Consolidated Migration: Run the db:generate command again. This will create one new migration file that contains all of your schema changes.
```
bun run db:generate
```
Apply the New Migration: Run the migrate command to apply the squashed migration to your local database.
```
bun run db:migrate
```
Re-ingest Data (If Necessary): If your schema changes impact how data is structured, you may need to re-ingest data to reflect those changes correctly.
```
# Example: Force re-ingestion for the last 7 days
bun run pipeline ingest --days 7 --force
```

Handling Migration Errors

If you encounter errors during the migration process (e.g., "table already exists" or "no such column"), your local database may be out of sync. The most reliable way to fix this is to start fresh by resetting your local data and applying all migrations in order.

Reset your local database:
```
rm data/db.sqlite
```
Sync with production data: This gives you a clean, production-like state.
```
bun run data:sync -y --remote upstream
```
Apply your new migration:
```
bun run db:migrate
```

Database Explorer

To interactively explore the database and its contents:

bun run db:studio

This launches Drizzle Studio, which provides a visual interface to browse tables, relationships, run queries, and export data.

Additional setup required if you use Safari or Brave: https://orm.drizzle.team/docs/drizzle-kit-studio#safari-and-brave-support

Troubleshooting

Common Issues

"GITHUB_TOKEN environment variable is required"
- Ensure your GitHub token is set in .env and the environment is loaded
- You can also run commands with the token directly: GITHUB_TOKEN=your_token bun run pipeline ingest -d 10
- GitHub Personal Access Token permissions:
  - Contents: Read and write
  - Metadata: (auto-enabled)
  - Actions: Read and write
  - Pages: Read and write
"No such table: repositories"
- Run bun run db:generate and bun run db:migrate to initialize the database
- Ensure the data directory exists: mkdir -p data
"Error fetching data from GitHub"
- Check your GitHub token has proper permissions
- Verify repository names are correct in config
- Ensure your token has not expired

Debugging

For more detailed logs, add the -v or --verbose flag to any command:

bun run pipeline ingest -d 10 -v

Directory Structure

.
├── data/               # Generated data and reports
│   └── db.sqlite       # SQLite database
├── cli/                # CLI program for pipeline
│   └── analyze-pipeline.ts  # Run typescript pipeline
├── config/             # Configuration files
│   └── pipeline.config.ts  # TypeScript pipeline configuration
├── drizzle/            # Database migration files
├── scripts/            # Utility scripts (see scripts/README.md)
│   ├── daily-automation.sh    # Continuous daily pipeline automation
│   ├── verify-data.sh         # Data quality verification
│   └── fetch_github.py        # GitHub metrics collection
├── src/
│   ├── app/            # Next.js app router pages
│   ├── components/     # React components
│   │   └── ui/         # shadcn/ui components
│   │
│   └── lib/
│       ├── pipelines/  # Modular pipeline system
│       │   ├── contributors/  # Contributor-specific pipeline components
│       │   ├── export/        # Pipelines to export JSON data
│       │   ├── ingest/        # Data ingestion pipeline components
│       │   ├── summarize/     # Pipelines to generate AI summaries
│       ├── data/          # Data sources and storage
│       │   ├── db.ts      # Database connection and configuration
│       │   ├── github.ts  # GitHub API integration
│       │   ├── ingestion.ts  # Data ingestion from GitHub API
│       │   ├── schema.ts  # Database schema definitions
│       │   └── types.ts   # Core data type definitions
│       ├── logger.ts      # Logging system
│       └── typeHelpers.ts # TypeScript helper utilities
├── profiles/           # Generated static profiles
└── .github/workflows   # Automation workflows

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 756 Commits
.claude/commands		.claude/commands
.github		.github
.husky		.husky
__testing__/helpers		__testing__/helpers
auth-worker		auth-worker
cli		cli
config		config
data		data
drizzle		drizzle
public		public
src		src
.env.example		.env.example
.gitignore		.gitignore
.lintstagedrc.mjs		.lintstagedrc.mjs
.nojekyll		.nojekyll
.prettierignore		.prettierignore
.prettierrc.json		.prettierrc.json
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
bun.lock		bun.lock
components.json		components.json
drizzle.config.ts		drizzle.config.ts
eslint.config.mjs		eslint.config.mjs
next.config.js		next.config.js
package.json		package.json
postcss.config.mjs		postcss.config.mjs
tailwind.config.ts		tailwind.config.ts
tsconfig.json		tsconfig.json
tsconfig.nextjs.json		tsconfig.nextjs.json

License

elizaOS/elizaos.github.io

Folders and files

Latest commit

History

Repository files navigation

GitHub Contributor Analytics

Prerequisites

Features

Wallet Linking (Optional Feature)

Setup

Quick Start

Commands and Capabilities

Data Ingestion

Data Processing and Analysis

Generating Stats and Exports

AI Summary Generation

Static JSON API

Leaderboard API

Summary API

Backfilling JSON API

Database Management

Website Generation

Automation Options

Local Automation

CI/CD and Data Management

GitHub Actions Workflows

Data Management Architecture

Deploying Your Own Instance

GitHub Pages Configuration

Deployment Architecture

Development

Taskmaster for AI-Assisted Development

MCP Setup (for IDE Integration)

TypeScript Pipeline

Updating schema

Working with Migrations

Database Explorer

Troubleshooting

Common Issues

Debugging

Directory Structure

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 12

Uh oh!

Languages

Packages