A Telegram bot that automatically collects Syrian news from 30+ Telegram channels (government and official sources), using AI to summarize, translate, label, and prioritize the content before posting daily summaries with generated banner images in both English and Arabic.
📢 English Channel: @SyriaDailyEN
📢 Arabic Channel: @SyriaDailyAR
- Multi-channel Collection: Monitors 30+ configurable Telegram channels from
channels.json
- AI-Powered Processing: Uses OpenAI or Anthropic models for summarization, translation, and deduplication
- 5-Stage Modular Pipeline: Separate Lambda functions for Collection → Early Deduplication → Summarization → Website Publishing → Telegram Posting
- State Management with DynamoDB: Tracks pipeline progress and ensures idempotent execution across all stages
- Multi-Round Deduplication: Early deduplication with round-robin redistribution across multiple rounds for maximum efficiency
- Parallel AI Processing: Batch summarization with up to 30 parallel batches of 20 items each
- Intelligent Deduplication: AI-powered merging of duplicate stories while preserving all sources
- Dynamic Banner Generation: Creates SVG-based banner images with category-specific backgrounds for 19+ news types
- Dual Language Support: Posts formatted summaries in both English and Arabic with language-specific banners
- Website Integration: Publishes to GitHub Pages website before posting to Telegram
- EventBridge Orchestration: Custom event-triggered Lambda functions for reliable, scalable pipeline execution
- Idempotency Guarantees: Each stage validates state before processing to prevent duplicate execution
- Local Development: Full local testing environment with caching system
- Damascus Timezone: Accurate 24-hour news collection based on local Syrian time
- Lightweight Scraping: Uses axios and JSDOM for efficient web content extraction
- ARM64 Optimization: Memory-efficient functions with integrated font rendering support
The system uses a modular pipeline where each stage is a separate Lambda function, orchestrated via custom EventBridge events:
Stage 1: Collection
- CollectFunction: Collects raw news posts from Telegram channels
- Scheduled execution at 20:01 UTC daily (23:01 Damascus time)
- Entry point:
src/lambda/Collect.ts
- Timeout: 10 minutes for web scraping
- Memory: 1GB, ARM64 architecture
- State tracking: Initializes briefing in DynamoDB and records collection timestamp
- Output:
collected-news/{date}.json
→ S3, emitsNewsCollected
event
Stage 2: Early Deduplication
- DeduplicateFunction: AI-powered early deduplication with multi-round processing
- Triggered by custom EventBridge event
NewsCollected
from Stage 1 - Entry point:
src/lambda/Deduplicate.ts
- Timeout: 15 minutes for multi-round AI processing
- Memory: 1GB, ARM64 architecture
- State tracking: Validates briefing hasn't been deduplicated, records deduplication timestamp
- Uses 150-item batches with round-robin redistribution between rounds
- Processes up to 5 parallel requests per batch group
- Output:
deduplicated-news/{date}.json
→ S3, emitsNewsDeduplicated
event
- Triggered by custom EventBridge event
Stage 3: Summarization
- SummarizeFunction: AI-powered summarization and translation
- Triggered by custom EventBridge event
NewsDeduplicated
from Stage 2 - Entry point:
src/lambda/Summarize.ts
- Timeout: 10 minutes for AI processing
- Memory: 512MB, ARM64 architecture
- State tracking: Validates briefing hasn't been summarized, records summarization timestamp
- Output:
summarized-news/{date}.json
→ S3, emitsNewsSummarized
event
- Triggered by custom EventBridge event
Stage 4: Website Publishing
- PublishToWebsiteFunction: Publishes news to GitHub Pages website
- Triggered by custom EventBridge event
NewsSummarized
from Stage 3 - Entry point:
src/lambda/PublishToWebsite.ts
- Timeout: 1 minute for GitHub API operations
- Memory: 256MB, ARM64 architecture
- State tracking: Validates briefing hasn't been published, records publishing timestamp
- Emits custom EventBridge event
summaries-published
after publishing
- Triggered by custom EventBridge event
Stage 5: Telegram Posting
- PostToTelegramEnglishFunction & PostToTelegramArabicFunction: Post formatted news with banners
- Triggered by custom EventBridge event from Stage 4 (same trigger for both)
- Entry point:
src/lambda/PostToTelegram.ts
- Timeout: 1 minute for posting
- Memory: 512MB, ARM64 architecture with font rendering layers
- State tracking: Validates briefing hasn't been posted for this language, records post URL
- Fetches pre-composed banners from S3 and adds date overlay
- Direct execution that runs pipeline stages locally for testing
- Entry point:
src/local/index.ts
- Tests both English and Arabic output in a single run
- Uses dotenv for environment variables
- Local caching system via
cache/
directory to avoid re-fetching and re-processing during development - Local pipeline: Collect → Deduplicate → Summarize → Format → (Optionally) Post
- Node.js 22.x
- Yarn
- Git
- A way to tunnel your local server to the internet (e.g. ngrok)
- A testing Telegram channel to post the summaries to (you can use the same channel for both languages)
- A Telegram bot token (you can get it from @BotFather)
- An AI API key (e.g. OpenAI or Anthropic)
- A GitHub token for publishing to website (optional for local development)
# Clone the repository
git clone https://github.com/your-username/sy-daily.git
cd sy-daily
# Install dependencies
yarn install
Required Credentials:
- Telegram Bot Token: Get from @BotFather
- Telegram API Credentials: Get from my.telegram.org
- Session String: Generated when you first run the app with valid API credentials. You leave it blank in local development for the first time you run the app, but make sure to temporarily disable the check in the
src/telegram/user.ts
file. - AI API Key: Either OpenAI or Anthropic API key for content processing
- Channel IDs: Telegram channel IDs where you want to post the summaries
- Create environment file: Copy
.env.example
to.env
and fill in your credentials.
DEV_PUBLIC_SERVER=your_public_server_url
TELEGRAM_BOT_TOKEN=your_bot_token
TELEGRAM_API_ID=your_api_id
TELEGRAM_API_HASH=your_api_hash
SESSION_STRING=your_session_string
# AI Provider Configuration (choose one or both)
OPENAI_API_KEY=your_openai_key
ANTHROPIC_API_KEY=your_anthropic_key
AI_MODEL=openai:gpt-4.1-2025-04-14
# Alternative: AI_MODEL=anthropic:claude-3-5-sonnet-20241022
# Telegram Channel Configuration
TELEGRAM_CHANNEL_ID_ENGLISH=your_english_channel_id
TELEGRAM_CHANNEL_ID_ARABIC=your_arabic_channel_id
# GitHub Configuration (optional for local development)
GITHUB_TOKEN=your_github_token
# Deployment Configuration (optional)
SIMULATE_WEBSITE_PUBLISH=false
ALERT_EMAIL=[email protected]
- Create or identify your Telegram channel IDs:
You can find your channel ID (typically a negative number) by starting the Telegram bot in your local environment
yarn run telegram:serve
then register the webhook
yarn run telegram:register-webhook
Note: you need to have a way to tunnel your local server to the internet (e.g. ngrok), that you need to set in the DEV_PUBLIC_SERVER
environment variable.
Once the webhook is registered, you can send a message to the channel and see the channel ID in the console output.
- Run the bot locally:
# Start the news collection and posting process
yarn start
# This will:
# 1. Collect news from 30+ configured Telegram channels
# 2. Process and summarize the content using AI
# 3. Generate banner images for both languages
# 4. Post formatted summaries to your configured channels
Notes:
- The first time you run the app, it will persist the cache in the
cache/cachedData.json
file. This will be used to skip the collection and summarization process the next time you run the app, and will speed up the process and save on the AI credits. You can delete the file to start fresh. - The first time you run the app, it will interactively ask you to enter your Telegram user credentials to acquire a session string. You can subsequently use the
SESSION_STRING
environment variable to avoid this step. Make sure to temporarily disable the check in thesrc/telegram/user.ts
file and set theSESSION_STRING
environment variable then re-enable the check.
# Run tests to ensure everything works
yarn test
# Test with UI
yarn run test:ui
# Build to check for TypeScript errors
yarn run build
yarn run telegram:serve # Start Telegram development server
npm run banners:compose # Generate banner compositions
npm run banners:update # Update all composed banner variants
./scripts/simulate-daily-trigger.sh # Manually trigger CollectFunction
./scripts/pull-remote-files.sh # Download S3 bucket contents for debugging
- AWS Account
- AWS CLI (configured with the appropriate permissions)
- SAM CLI
- Docker
- Email address for DLQ alerts
npm run predeploy
npm run deploy
npm run telegram:register-webhook
npm run sam:build # Build SAM application
npm run sam:local # Start local Lambda environment
npm run sam:invoke:collect # Invoke Collect function locally
npm run sam:invoke:collect:event # Invoke Collect with scheduled event
npm run sam:invoke:post:english # Invoke PostToTelegramEnglish function
npm run sam:invoke:post:arabic # Invoke PostToTelegramArabic function
npm run sam:dev # Full dev workflow
src/
├── lambda/ # Lambda entry points (5-stage pipeline)
│ ├── Collect.ts # Stage 1: Collection handler
│ ├── Deduplicate.ts # Stage 2: Early deduplication handler
│ ├── Summarize.ts # Stage 3: Summarization handler
│ ├── PublishToWebsite.ts # Stage 4: Website publishing handler
│ └── PostToTelegram.ts # Stage 5: Posting handler (English/Arabic)
├── local/
│ └── index.ts # Local development entry point (full pipeline)
├── news-collection/
│ ├── collect.ts # Main collection logic
│ ├── extractSANAArticleContent.ts # Content extraction from articles
│ ├── browser.ts # Axios + JSDOM web scraping
│ ├── processSANATelegramPost.ts # Individual post processing
│ └── telegram/
│ └── getPostsInLast24Hours.ts # Multi-channel Telegram API integration
├── ai/
│ ├── deduplicate.ts # Multi-round AI deduplication with round-robin redistribution
│ ├── summarize.ts # Batch AI summarization (parallel processing)
│ ├── getLLMProvider.ts # AI provider abstraction (OpenAI/Anthropic)
│ └── customTerms.ts # Custom terminology handling
├── db/
│ ├── Table.ts # DynamoDB table configuration
│ └── BriefingEntity.ts # Briefing entity schema and state management operations
├── publish/
│ └── publishToGitHub.ts # GitHub API integration for website publishing
├── banner/
│ ├── newsBanner.ts # SVG-based banner generation with date overlay
│ ├── composeBanners.ts # Banner composition utility
│ └── bannersDemo.ts # Banner generation demo and testing
├── formatting/
│ ├── index.ts # Formatting system entry point
│ ├── telegramNewsFormatter.ts # Telegram message formatting
│ ├── measureTelegramRenderedHtml.ts # HTML rendering measurement
│ └── strings.ts # String constants and templates
├── telegram/
│ ├── bot.ts # Grammy-based Telegram bot
│ └── user.ts # Telegram user client for channel posting
├── telegram-dev/
│ ├── registerTelegramWebhook.ts # Webhook registration utility
│ └── server.ts # Development server
├── utils/
│ └── dateUtils.ts # Damascus timezone utilities
├── prioritizeNews.ts # News prioritization logic with label weighting
├── prioritizeAndFormat.ts # Combined prioritization and formatting
├── mostFrequentLabel.ts # News category detection for banners
└── types.ts # TypeScript type definitions and Zod schemas
assets/
├── fonts/ # Arabic fonts for banner generation
├── label-bgs/ # Category-specific background images (19 types)
├── logo-arabic.png # Arabic logo
├── logo-english.png # English logo
└── telegram-logo.png # Telegram branding
composedBanners/ # Pre-composed banners (uploaded to S3)
├── english/ # Pre-composed English banners
└── arabic/ # Pre-composed Arabic banners
channels.json # Channel configuration (30+ sources)
template.yml # AWS SAM template (6 Lambda functions)
esbuild.config.ts # Build configuration for Lambda bundling
vitest.config.ts # Test configuration
events/ # SAM local event files
├── s3-event.json # S3 event for testing Lambda functions
└── schedule-event.json # Scheduled event for testing collection
scripts/ # Deployment and testing utilities
├── simulate-daily-trigger.sh # Manually trigger CollectFunction
└── pull-remote-files.sh # Download S3 bucket contents for debugging
deploy.sh # Deployment script
updateComposedBanners.sh # Banner update utility
Stage 1: CollectFunction (Scheduled at 20:01 UTC daily)
- State Initialization: Creates briefing record in DynamoDB
- Collection: Uses Telegram API to fetch posts from 30+ configured channels in the last 24 hours (Damascus time)
- Processing: Extracts article content from linked URLs using axios and JSDOM
- Storage: Uploads raw collected posts to S3 at
collected-news/{date}.json
- State Update: Records collection completion timestamp in DynamoDB
- Event Emission: Emits custom EventBridge event
NewsCollected
to trigger next stage
Stage 2: DeduplicateFunction (Triggered by custom EventBridge event NewsCollected
)
7. State Validation: Checks briefing exists and hasn't been deduplicated
8. Retrieval: Downloads raw collected posts from S3
9. Multi-Round Deduplication: Implements AI-powered deduplication with round-robin redistribution
- Splits items into batches of 150 items
- Processes up to 5 batches in parallel per round
- Redistributes items using round-robin between rounds to maximize deduplication opportunities
- Continues until 98% ratio threshold is reached or max rounds completed
- Skips local file writes when running in Lambda environment
- Storage: Uploads deduplicated posts to S3 at
deduplicated-news/{date}.json
- State Update: Records deduplication completion timestamp in DynamoDB (with graceful error handling)
- Event Emission: Emits custom EventBridge event
NewsDeduplicated
to trigger next stage
Stage 3: SummarizeFunction (Triggered by custom EventBridge event NewsDeduplicated
)
13. State Validation: Checks briefing exists and hasn't been summarized
14. Retrieval: Downloads deduplicated posts from S3
15. Batch Processing: Splits posts into batches of 20 items each
16. Parallel Summarization: Processes up to 30 batches in parallel using AI
17. Translation: Creates English summaries and translations from Arabic content
18. Storage: Uploads summarized data to S3 at summarized-news/{date}.json
19. State Update: Records summarization completion timestamp in DynamoDB (with graceful error handling)
20. Event Emission: Emits custom EventBridge event NewsSummarized
to trigger next stage
Stage 4: PublishToWebsiteFunction (Triggered by custom EventBridge event NewsSummarized
)
21. State Validation: Checks briefing exists and hasn't been published to website
22. Retrieval: Downloads summarized news from S3
23. GitHub Publishing: Publishes content to GitHub repository for website deployment (or simulates if SIMULATE_WEBSITE_PUBLISH=true
)
24. State Update: Records website publishing completion timestamp in DynamoDB (with graceful error handling)
25. Event Emission: Emits custom EventBridge event summaries-published
to notify Telegram functions
Stage 5: PostToTelegramFunction (Both English and Arabic triggered by custom EventBridge event summaries-published
)
26. State Validation: Checks briefing exists and hasn't been posted for this language
27. Retrieval: Downloads summarized news from S3
28. Final Prioritization: Analyzes and prioritizes news items using weighted label system
29. Formatting: Formats news items into structured Telegram messages (language-specific with HTML formatting)
30. Banner Selection: Determines most frequent news category and fetches pre-composed banner from S3
31. Date Overlay: Adds date overlay to banner image
32. Publishing: Posts banner image with formatted summary to respective target Telegram channels via TelegramUser client
33. State Update: Records Telegram post URL in DynamoDB (with graceful error handling)
- Executes pipeline stages locally via
src/local/index.ts
- Local pipeline: Collect → Deduplicate → Summarize → Format → (Optionally) Post
- Tests both English and Arabic output
- Uses local caching system to avoid re-fetching and re-processing during development
MIT