A modular TypeScript-based news aggregator that collects, enriches, and analyzes AI-related content from multiple sources.
-
Modular Plugin System
Easily extendable with plugins for data sources, AI processing, content enrichment, summary generation, and storage. -
Diverse Data Sources
Pre-built plugins for:- Discord (raw messages, user details, AI-summarized conversations, media download)
- GitHub (repository statistics, contributor activity)
- Cryptocurrency Analytics (Solana via DexScreener, general tokens via Codex API, market data via CoinGecko)
- Generic APIs (configurable for various REST endpoints)
-
AI-Powered Processing
- Automated content summarization (e.g., daily reports, Discord channel activity) using configurable AI providers (OpenAI, OpenRouter).
- Token limit resilience with automatic fallback models for large content processing.
- Optional content enrichment (e.g., topic extraction, image generation).
-
Flexible Storage & Output
- SQLite for persistent storage of fetched content and generated summaries.
- Customizable data export (e.g., raw daily Discord data as JSON).
- Generation of summaries in JSON and Markdown formats.
-
Historical Data Processing
Dedicated script (historical.ts) for fetching and processing data from past dates or ranges. -
Configuration Driven
Behavior controlled by JSON configuration files and environment variables.
- Node.js รขโฐยฅ 18 (v23 recommended)
- TypeScript
- SQLite3 (command-line tool required for integrity checks)
- npm
git clone https://github.com/yourusername/ai-news.git
cd ai-news
npm install
cp example.env .envUse JSON files in the config/ directory and a .env file for secrets.
OPENAI_API_KEY=
USE_OPENROUTER=true
SITE_URL=your_site.com
SITE_NAME=YourAppName
DISCORD_TOKEN=
DISCORD_GUILD_ID=
CODEX_API_KEY=Create three repository secrets in GitHub:
ENV_SECRETSรขโฌ" JSON object with credentials:
{
"OPENAI_API_KEY": "sk-...",
"USE_OPENROUTER": "true",
"SITE_URL": "your_site.com",
"SITE_NAME": "YourAppName",
"DISCORD_APP_ID": "your_discord_app_id",
"DISCORD_TOKEN": "your_discord_bot_token",
"DISCORD_GUILD_ID": "your_discord_guild_id",
"CODEX_API_KEY": "your_codex_key"
}SQLITE_ENCRYPTION_KEYรขโฌ" strong password to encrypt the database.
COLLECT_WEBHOOK_URLรขโฌ" Your webhook server endpoint:
https://your-server.com/run-collect
COLLECT_WEBHOOK_SECRETรขโฌ" HMAC signing secret (generate withopenssl rand -hex 32):
a1b2c3d4e5f6...
npm run build
npm start
npm start -- --source=elizaos.jsonnpm run historical -- --source=elizaos.json --output=./output
npm run historical -- --source=hyperfy-discord.json --after=2024-01-10 --before=2024-01-16 --output=./output/hyperfy
npm run historical -- --source=discord-raw.json --after=2024-01-15 --output=./output/discord
npm run historical -- --source=discord-raw.json --before=2024-01-10 --output=./output/discordAutomatically discover and track Discord channels across all configured servers:
# Generate channel checklist (runs daily via GitHub Action)
npm run discover-channels
# Test mode (validate configs without Discord API)
npm run discover-channels -- --test-configs
# With media download
npm run historical -- --source=elizaos.json --download-media=true --date=2024-01-15npm run download-media # Today's media
npm run download-media -- --date=2024-01-15 # Specific date
npm run download-media -- --start=2024-01-10 --end=2024-01-15 # Date range๐ Channel Checklist: View and edit tracked channels at scripts/CHANNELS.md
Update configs based on checked channels in the checklist:
# Apply changes from checklist to config files
npm run update-configs
# Preview changes without applying
npm run update-configs -- --dry-runOption A: GitHub Web Interface (Automated)
- Open scripts/CHANNELS.md on GitHub
- Edit file and check/uncheck channel boxes
- Commit changes
- GitHub Action automatically runs
update-configsand commits any config changes
Option B: Local Development
- Run
npm run discover-channelsto update checklist - Edit
scripts/CHANNELS.mdlocally to check/uncheck channels - Run
npm run update-configsto update config files - Commit and push changes
Option C: Manual GitHub Workflow
- Open scripts/CHANNELS.md on GitHub and edit
- Commit changes โ Pull locally:
git pull - Apply updates:
npm run update-configs
For running data collection on a server instead of GitHub Actions (recommended for media downloads due to file size limits):
- Clone repository to server:
git clone <repo> ~/ai-news - Install dependencies:
cd ~/ai-news && npm install && npm run build - Copy
.env.exampleto.envand configure with your API keys - Generate webhook secret:
openssl rand -hex 32 - Start webhook server:
export COLLECT_WEBHOOK_SECRET="your-generated-secret" npm run webhook
- Setup reverse proxy (Nginx/Caddy) with HTTPS for production
Webhook Server:
# Start server (listens on localhost:3000)
export COLLECT_WEBHOOK_SECRET="your-secret"
npm run webhook
# Test webhook locally
./scripts/test-webhook.sh elizaos.json 2025-01-15Manual Collection (Alternative):
# Direct script execution
./scripts/collect-daily.sh elizaos.json
./scripts/collect-daily.sh hyperfy-discord.json 2025-01-15GitHub Actions Integration:
- Configure
COLLECT_WEBHOOK_URLandCOLLECT_WEBHOOK_SECRETin GitHub Secrets - GitHub Actions sends HMAC-signed webhook requests daily at 6 AM UTC
- View/trigger manual runs at Actions > Daily Media Collection
- No SSH keys or server access needed
Benefits of Webhook Approach:
- No SSH complexity or key management
- Secure HMAC signature verification
- No GitHub file size limits for media downloads
- GitHub Actions provides scheduling and monitoring
- Simple HTTP-based integration
Discord media files (images, videos, attachments) can be downloaded to a VPS using a manifest-based approach.
- GitHub Actions generates a
media-manifest.jsonwith URLs during daily runs - Manifest is deployed to gh-pages branch
- VPS script fetches manifest and downloads files
No API calls - reads directly from existing database:
# Single date
npm run generate-manifest -- --db data/elizaos.sqlite --date 2024-12-14 --source elizaos
# Date range (for backfill)
npm run generate-manifest -- --db data/elizaos.sqlite --start 2024-12-01 --end 2024-12-14 --source elizaos
# Custom output path
npm run generate-manifest -- --db data/elizaos.sqlite --date 2024-12-14 --source elizaos --manifest-output ./my-manifest.json
# View manifest contents
cat ./output/elizaos/media-manifest.json | jq '.stats'Each manifest entry includes full Discord metadata for querying:
# Filter by user
cat manifest.json | jq '.files[] | select(.user_id == "123456789")'
# Only direct attachments (no embeds)
cat manifest.json | jq '.files[] | select(.media_type == "attachment")'
# Files with reactions
cat manifest.json | jq '.files[] | select(.reactions != null)'
# Count per user
cat manifest.json | jq '[.files[].user_id] | group_by(.) | map({user: .[0], count: length}) | sort_by(-.count)'Fields: url, filename, user_id, guild_id, channel_id, message_id, message_content, reactions, media_type, content_type, width, height, size, proxy_url
# Clone and setup
git clone https://github.com/M3-org/ai-news.git ~/ai-news-media
python3 ~/ai-news-media/scripts/media-sync.py setup
# Download media (from gh-pages manifests)
python3 ~/ai-news-media/scripts/media-sync.py sync --dry-run # Preview
python3 ~/ai-news-media/scripts/media-sync.py sync # Download
python3 ~/ai-news-media/scripts/media-sync.py sync --min-free 1000 # Stop if <1GB free
# Check status (disk usage and media sizes)
python3 ~/ai-news-media/scripts/media-sync.py statusThe setup command installs a systemd timer that runs daily at 01:30 UTC.
Discord CDN URLs expire after ~24 hours. Use refresh to fetch fresh URLs and download:
export DISCORD_TOKEN # Bot token required
# Download all files for a specific user
python3 scripts/media-sync.py refresh manifest.json --user USER_ID -o ./user_media
# Only attachments (no embeds/thumbnails)
python3 scripts/media-sync.py refresh manifest.json --user USER_ID --type attachment
# Preview without downloading
python3 scripts/media-sync.py refresh manifest.json --user USER_ID --dry-runAfter GitHub Actions runs, manifests are available at:
https://raw.githubusercontent.com/M3-org/ai-news/gh-pages/elizaos/media-manifest.jsonhttps://raw.githubusercontent.com/M3-org/ai-news/gh-pages/hyperfy/media-manifest.json
.github/ GitHub Actions workflows
config/ Configuration files
data/ Encrypted SQLite databases
docs/ Docusaurus documentation
src/ Source code
aggregator/ Aggregators (ContentAggregator, HistoricalAggregator)
plugins/ Plugins (sources, enrichers, generators, storage)
helpers/ Utility functions
types.ts Type definitions
index.ts Main entry point
historical.ts Historical data runner
example.env Template .env
README.md This file
- Create a new class in
src/plugins/sources/implementingContentSource:
import { ContentItem } from "../../types";
export interface ContentSource {
name: string;
fetchItems(): Promise<ContentItem[]>;
fetchHistorical?(date: string): Promise<ContentItem[]>;
}- Add the new source to the desired config file:
{
"type": "YourNewSource",
"name": "descriptive-name",
"interval": 300,
"params": {}
}git checkout -b feature/YourFeature
git commit -m "Add YourFeature"
git push origin feature/YourFeatureMIT
{
cid: string;
type: string;
source: string;
text?: string;
date?: number;
metadata?: { [key: string]: any };
}{
type: string;
title?: string;
categories?: string;
markdown?: string;
date?: number;
}- Discord:
DiscordRawDataSource,DiscordChannelSource,DiscordAnnouncementSource - GitHub:
GitHubStatsDataSource,GitHubDataSource - Crypto:
CodexAnalyticsSource,CoinGeckoAnalyticsSource,SolanaAnalyticsSource - Generic:
ApiSource
GitHub Actions workflows in .github/workflows/ automate scheduled processing for data sources and summary generation.