AI News Aggregator

A modular TypeScript-based news aggregator that collects, enriches, and analyzes AI-related content from multiple sources.

Features

Modular Plugin System
Easily extendable with plugins for data sources, AI processing, content enrichment, summary generation, and storage.
Diverse Data Sources
Pre-built plugins for:
- Discord (raw messages, user details, AI-summarized conversations, media download)
- GitHub (repository statistics, contributor activity)
- Cryptocurrency Analytics (Solana via DexScreener, general tokens via Codex API, market data via CoinGecko)
- Generic APIs (configurable for various REST endpoints)
AI-Powered Processing
- Automated content summarization (e.g., daily reports, Discord channel activity) using configurable AI providers (OpenAI, OpenRouter).
- Token limit resilience with automatic fallback models for large content processing.
- Optional content enrichment (e.g., topic extraction, image generation).
Flexible Storage & Output
- SQLite for persistent storage of fetched content and generated summaries.
- Customizable data export (e.g., raw daily Discord data as JSON).
- Generation of summaries in JSON and Markdown formats.
Historical Data Processing
Dedicated script (historical.ts) for fetching and processing data from past dates or ranges.
Configuration Driven
Behavior controlled by JSON configuration files and environment variables.

Prerequisites

Node.js â‰¥ 18 (v23 recommended)
TypeScript
SQLite3 (command-line tool required for integrity checks)
npm

Installation

git clone https://github.com/yourusername/ai-news.git
cd ai-news
npm install
cp example.env .env

Configuration

Use JSON files in the config/ directory and a .env file for secrets.

Example `.env` File

OPENAI_API_KEY=
USE_OPENROUTER=true
SITE_URL=your_site.com
SITE_NAME=YourAppName

DISCORD_TOKEN=
DISCORD_GUILD_ID=

CODEX_API_KEY=

GitHub Actions Secrets

Create three repository secrets in GitHub:

ENV_SECRETS â€" JSON object with credentials:

{
  "OPENAI_API_KEY": "sk-...",
  "USE_OPENROUTER": "true",
  "SITE_URL": "your_site.com",
  "SITE_NAME": "YourAppName",
  "DISCORD_APP_ID": "your_discord_app_id",
  "DISCORD_TOKEN": "your_discord_bot_token",
  "DISCORD_GUILD_ID": "your_discord_guild_id",
  "CODEX_API_KEY": "your_codex_key"
}

SQLITE_ENCRYPTION_KEY â€" strong password to encrypt the database.

For Webhook Server Integration (deploy-media-collection.yml)

COLLECT_WEBHOOK_URL â€" Your webhook server endpoint:

https://your-server.com/run-collect

COLLECT_WEBHOOK_SECRET â€" HMAC signing secret (generate with openssl rand -hex 32):

a1b2c3d4e5f6...

Running the Application

npm run build
npm start
npm start -- --source=elizaos.json

Historical Data Fetching

npm run historical -- --source=elizaos.json --output=./output
npm run historical -- --source=hyperfy-discord.json --after=2024-01-10 --before=2024-01-16 --output=./output/hyperfy
npm run historical -- --source=discord-raw.json --after=2024-01-15 --output=./output/discord
npm run historical -- --source=discord-raw.json --before=2024-01-10 --output=./output/discord

Channel Management

Channel Discovery

Automatically discover and track Discord channels across all configured servers:

# Generate channel checklist (runs daily via GitHub Action)
npm run discover-channels

# Test mode (validate configs without Discord API)
npm run discover-channels -- --test-configs

# With media download
npm run historical -- --source=elizaos.json --download-media=true --date=2024-01-15

Media Download

npm run download-media                                    # Today's media
npm run download-media -- --date=2024-01-15             # Specific date
npm run download-media -- --start=2024-01-10 --end=2024-01-15  # Date range

📋 Channel Checklist: View and edit tracked channels at scripts/CHANNELS.md

Configuration Updates

Update configs based on checked channels in the checklist:

# Apply changes from checklist to config files
npm run update-configs

# Preview changes without applying
npm run update-configs -- --dry-run

Workflow Options

Option A: GitHub Web Interface (Automated)

Open scripts/CHANNELS.md on GitHub
Edit file and check/uncheck channel boxes
Commit changes
GitHub Action automatically runs update-configs and commits any config changes

Option B: Local Development

Run npm run discover-channels to update checklist
Edit scripts/CHANNELS.md locally to check/uncheck channels
Run npm run update-configs to update config files
Commit and push changes

Option C: Manual GitHub Workflow

Open scripts/CHANNELS.md on GitHub and edit
Commit changes → Pull locally: git pull
Apply updates: npm run update-configs

Server Deployment

For running data collection on a server instead of GitHub Actions (recommended for media downloads due to file size limits):

Webhook Server Setup

Clone repository to server: git clone <repo> ~/ai-news
Install dependencies: cd ~/ai-news && npm install && npm run build
Copy .env.example to .env and configure with your API keys
Generate webhook secret: openssl rand -hex 32

Start webhook server:

export COLLECT_WEBHOOK_SECRET="your-generated-secret"
npm run webhook

Setup reverse proxy (Nginx/Caddy) with HTTPS for production

Usage

Webhook Server:

# Start server (listens on localhost:3000)
export COLLECT_WEBHOOK_SECRET="your-secret"
npm run webhook

# Test webhook locally
./scripts/test-webhook.sh elizaos.json 2025-01-15

Manual Collection (Alternative):

# Direct script execution
./scripts/collect-daily.sh elizaos.json
./scripts/collect-daily.sh hyperfy-discord.json 2025-01-15

GitHub Actions Integration:

Configure COLLECT_WEBHOOK_URL and COLLECT_WEBHOOK_SECRET in GitHub Secrets
GitHub Actions sends HMAC-signed webhook requests daily at 6 AM UTC
View/trigger manual runs at Actions > Daily Media Collection
No SSH keys or server access needed

Benefits of Webhook Approach:

No SSH complexity or key management
Secure HMAC signature verification
No GitHub file size limits for media downloads
GitHub Actions provides scheduling and monitoring
Simple HTTP-based integration

Media Download

Discord media files (images, videos, attachments) can be downloaded to a VPS using a manifest-based approach.

How It Works

GitHub Actions generates a media-manifest.json with URLs during daily runs
Manifest is deployed to gh-pages branch
VPS script fetches manifest and downloads files

Generate Manifest Locally

No API calls - reads directly from existing database:

# Single date
npm run generate-manifest -- --db data/elizaos.sqlite --date 2024-12-14 --source elizaos

# Date range (for backfill)
npm run generate-manifest -- --db data/elizaos.sqlite --start 2024-12-01 --end 2024-12-14 --source elizaos

# Custom output path
npm run generate-manifest -- --db data/elizaos.sqlite --date 2024-12-14 --source elizaos --manifest-output ./my-manifest.json

# View manifest contents
cat ./output/elizaos/media-manifest.json | jq '.stats'

Manifest Contents

Each manifest entry includes full Discord metadata for querying:

# Filter by user
cat manifest.json | jq '.files[] | select(.user_id == "123456789")'

# Only direct attachments (no embeds)
cat manifest.json | jq '.files[] | select(.media_type == "attachment")'

# Files with reactions
cat manifest.json | jq '.files[] | select(.reactions != null)'

# Count per user
cat manifest.json | jq '[.files[].user_id] | group_by(.) | map({user: .[0], count: length}) | sort_by(-.count)'

Fields: url, filename, user_id, guild_id, channel_id, message_id, message_content, reactions, media_type, content_type, width, height, size, proxy_url

VPS Setup

# Clone and setup
git clone https://github.com/M3-org/ai-news.git ~/ai-news-media
python3 ~/ai-news-media/scripts/media-sync.py setup

# Download media (from gh-pages manifests)
python3 ~/ai-news-media/scripts/media-sync.py sync --dry-run  # Preview
python3 ~/ai-news-media/scripts/media-sync.py sync            # Download
python3 ~/ai-news-media/scripts/media-sync.py sync --min-free 1000  # Stop if <1GB free

# Check status (disk usage and media sizes)
python3 ~/ai-news-media/scripts/media-sync.py status

The setup command installs a systemd timer that runs daily at 01:30 UTC.

Download with Fresh URLs

Discord CDN URLs expire after ~24 hours. Use refresh to fetch fresh URLs and download:

export DISCORD_TOKEN  # Bot token required

# Download all files for a specific user
python3 scripts/media-sync.py refresh manifest.json --user USER_ID -o ./user_media

# Only attachments (no embeds/thumbnails)
python3 scripts/media-sync.py refresh manifest.json --user USER_ID --type attachment

# Preview without downloading
python3 scripts/media-sync.py refresh manifest.json --user USER_ID --dry-run

Manifest Location

After GitHub Actions runs, manifests are available at:

https://raw.githubusercontent.com/M3-org/ai-news/gh-pages/elizaos/media-manifest.json
https://raw.githubusercontent.com/M3-org/ai-news/gh-pages/hyperfy/media-manifest.json

Project Structure

.github/              GitHub Actions workflows
config/               Configuration files
data/                 Encrypted SQLite databases
docs/                 Docusaurus documentation
src/                  Source code
  aggregator/         Aggregators (ContentAggregator, HistoricalAggregator)
  plugins/            Plugins (sources, enrichers, generators, storage)
  helpers/            Utility functions
  types.ts            Type definitions
  index.ts            Main entry point
  historical.ts       Historical data runner
example.env           Template .env
README.md             This file

Adding New Sources

Create a new class in src/plugins/sources/ implementing ContentSource:

import { ContentItem } from "../../types";

export interface ContentSource {
  name: string;
  fetchItems(): Promise<ContentItem[]>;
  fetchHistorical?(date: string): Promise<ContentItem[]>;
}

Add the new source to the desired config file:

{
  "type": "YourNewSource",
  "name": "descriptive-name",
  "interval": 300,
  "params": {}
}

Contributing

git checkout -b feature/YourFeature
git commit -m "Add YourFeature"
git push origin feature/YourFeature

License

MIT

Core Data Structures

`ContentItem`

{
  cid: string;
  type: string;
  source: string;
  text?: string;
  date?: number;
  metadata?: { [key: string]: any };
}

`SummaryItem`

{
  type: string;
  title?: string;
  categories?: string;
  markdown?: string;
  date?: number;
}

Supported Source Types

Discord: DiscordRawDataSource, DiscordChannelSource, DiscordAnnouncementSource
GitHub: GitHubStatsDataSource, GitHubDataSource
Crypto: CodexAnalyticsSource, CoinGeckoAnalyticsSource, SolanaAnalyticsSource
Generic: ApiSource

Scheduled Tasks

GitHub Actions workflows in .github/workflows/ automate scheduled processing for data sources and summary generation.

Name		Name	Last commit message	Last commit date
Latest commit History 273 Commits
.cursor/rules		.cursor/rules
.github		.github
config		config
data		data
frontend		frontend
scripts		scripts
src		src
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
WORKFLOW_FIXES.md		WORKFLOW_FIXES.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

M3-org/ai-news

Folders and files

Latest commit

History

Repository files navigation

AI News Aggregator

Features

Prerequisites

Installation

Configuration

Example .env File

GitHub Actions Secrets

For Webhook Server Integration (deploy-media-collection.yml)

Running the Application

Historical Data Fetching

Channel Management

Channel Discovery

Media Download

Configuration Updates

Workflow Options

Server Deployment

Webhook Server Setup

Usage

Media Download

How It Works

Generate Manifest Locally

Manifest Contents

VPS Setup

Download with Fresh URLs

Manifest Location

Project Structure

Adding New Sources

Contributing

License

Core Data Structures

ContentItem

SummaryItem

Supported Source Types

Scheduled Tasks

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Example `.env` File

`ContentItem`

`SummaryItem`

Packages