An advanced AI-powered Discord bot featuring multimodal content processing, sophisticated game systems, and intelligent conversation management ๐
| Feature | Description |
|---|---|
| Real-time Streaming Chat | AI responses stream in real-time with intelligent message editing and context awareness |
| Dynamic Personality System | AI-powered conversation tone detection with configurable personalities and adaptive response styles |
| Advanced Image Editing | Google Imagen integration for multi-image editing operations with direct prompt processing |
| Multimodal Content Processing | Comprehensive support for images, videos, PDFs, URLs, and rich media analysis |
| Complete Game System | Six fully-featured games with AI integration including TicTacToe, RPG, GeoGuesser, and more |
| Intelligent Message Routing | AI-powered intent classification and specialized flow orchestration |
| Advanced Security Framework | Standalone prompt injection testing, hierarchical operators, and comprehensive security validation |
| Real-time Debug UI | Modular flow monitoring interface with live statistics, WebSocket updates, and comprehensive debugging |
| Message Cache System | Sliding window conversation cache with 64-message threshold and context optimization |
| Content Detection Engine | Generic attachment caching and multimodal content analysis |
| Voice Generation | Multi-voice text-to-speech with Google AI integration |
| Feature | Description |
|---|---|
| Multi-Voice Support | 26 distinct AI voices with diverse personalities including Charon (professional), Fenrir (excitable), Aoede (musical), and Kore (authoritative) |
| Advanced Audio Processing | PCM to WAV conversion, duration calculation, and waveform visualization |
| Discord Integration | Seamless audio file delivery with metadata and interactive voice selection |
| Content Filtering | Built-in safety validation and content moderation |
| Game | Description |
|---|---|
| AI Uprising | Immersive text-based RPG with AI-powered narrative, combat system, character progression, and equipment management |
| TicTacToe | Classic game with intelligent AI opponent featuring three difficulty levels and strategic decision-making |
| GeoGuesser | Geographic guessing game with AI validation, curated location database, and Mapillary street imagery |
| Blackjack | Full casino-style card game with betting system, chip management, and standard blackjack rules |
| Hangman | Word guessing game with AI word generation, visual progression, and progressive hint system |
| WordScramble | Programming-focused vocabulary puzzle with multiplayer support and technology terms |
| Feature | Description |
|---|---|
| AI-Powered Tone Detection | Analyzes user messages and conversation context to determine optimal response personality |
| Configurable Personalities | JSON-based personality definitions with custom system prompts and behavioral patterns |
| Context-Aware Selection | Intelligent personality routing based on message content, user intent, and conversation history |
| Enhanced Media Awareness | Personality system includes advanced media processing capabilities with conversation history context and intelligent media filtering |
| Fallback Reliability | Graceful degradation to general personality for consistent user experience |
| Structured Output Validation | Zod schema validation ensuring reliable personality detection and selection |
| Capability | Description |
|---|---|
| Multi-Image Operations | Support for combining, merging, and complex editing operations across multiple images |
| Direct Prompt Processing | Takes user editing instructions exactly as-is without parsing or modification |
| Google Imagen Integration | Optimized parameters and specialized prompting for professional image editing |
| Comprehensive Scenarios | Single image edits, multi-image combining, style modifications, and object manipulation |
| Type-Safe Processing | Zod schema validation throughout the image editing pipeline |
| Capability | Description |
|---|---|
| Image Generation | Gemini 2.0 Flash image generation with natural language prompt parsing and multiple artistic styles |
| Code Execution | Server-side Python code execution with real-time streaming and comprehensive error handling |
| Search Grounding | Real-time web search integration with citation support and source verification |
| Video Analysis | YouTube and general video processing with multimodal AI understanding from attachments and conversation history |
| PDF Processing | Document analysis and content extraction with streaming responses from both current and cached documents |
| URL Context | Web page analysis and content summarization with intelligent routing |
| Conversation History Media | AI-powered intelligent access and analysis of images, videos, and PDFs from recent conversation history when contextually relevant |
| Smart Media Filtering | AI-powered relevance detection that automatically filters conversation media to include only contextually relevant items, reducing token usage and improving response accuracy |
| Component | Description |
|---|---|
| Flow Orchestrator | Central routing hub analyzing content and directing to appropriate specialized processing flows with configurable AI routing models |
| Content Detection Service | Comprehensive analysis of message content to determine optimal processing strategies |
| Message Validator | Response strategy determination based on mentions, replies, game mode, and autonomous opportunities |
| Context Optimization | AI-powered conversation history filtering with relevance scoring and token optimization |
| Media Relevance Engine | Advanced AI-powered system that analyzes cached media from conversation history to determine contextual relevance to current user requests, featuring configurable thresholds, multi-dimensional scoring, and performance optimization |
| Attachment Caching | Generic caching system eliminating duplicate downloads and enabling instant access |
| MCP Integration | Model Context Protocol server management with dynamic tool discovery, parameter resolution, and intelligent tool chaining capabilities |
| Feature | Description |
|---|---|
| Dynamic Server Discovery | Automatic MCP server detection and connection management with configuration caching |
| Intelligent Tool Selection | AI-powered analysis of available MCP tools with automatic parameter mapping and validation |
| Tool Chaining | Advanced capability to chain multiple MCP tools together for complex multi-step operations |
| Parameter Resolution | Dynamic parameter resolution system that intelligently maps Discord message content to MCP tool requirements |
| Configurable Routing | Flexible routing model configuration allowing different AI models for different analysis tasks |
| Real-time Tool Discovery | Live discovery and integration of new MCP tools without bot restart |
| Schema Validation | Comprehensive validation of MCP tool schemas with automatic compatibility checking |
| Feature | Description |
|---|---|
| Prompt Injection Testing | Standalone testing framework for validating system prompt resilience against injection attacks |
| Centralized System Prompts | Unified prompt management system with personality-specific prompt loading and caching |
| Hierarchical Operators | Primary operator from environment with sub-operator management capabilities |
| Channel Whitelisting | Separate bot and autonomous response whitelists with database persistence |
| Domain Security | Strict domain whitelisting for media processing (Discord CDN, approved platforms only) |
| Content Validation | Comprehensive file size, type, and security validation for all attachments |
| Natural Language Auth | Conversational interface for managing operators and whitelists through AI understanding |
| System | Description |
|---|---|
| Message Cache Service | Sliding window conversation cache with automatic initialization and context management |
| Game State Persistence | Complete game state serialization and recovery with timeout management |
| Prisma Database | SQLite-based storage for users, channels, messages, and game sessions |
| Relevance Scoring | Multi-dimensional conversation analysis for intelligent context optimization |
| Generic Attachment System | Unified caching architecture for images, PDFs, videos, and future content types |
| Component | Description |
|---|---|
| Real-time Flow Monitoring | Live WebSocket-powered monitoring of all AI flow executions with instant updates |
| Modular Architecture | Split from monolithic to component-based architecture with AppHeader, ConnectionStatus, ControlPanel, and FlowList |
| Intelligent Flow Sequences | Automatic detection and visualization of flow delegation patterns (MESSAGE_ROUTING โ CONVERSATION โ AUTH_FLOW) |
| Interactive Flow Cards | Comprehensive flow metadata display with expandable content, action buttons, and performance analysis |
| Advanced Filtering System | Real-time filtering by flow type, user ID, with debounced input handling and persistent state |
| Connection Management | Detailed connection status tracking, uptime statistics, and automatic reconnection handling |
| Accessibility Support | WCAG compliance with keyboard navigation, screen reader announcements, and keyboard shortcuts |
| Export & Analysis | Data export capabilities, performance metrics, and comprehensive debugging information |
Usage:
# Start debug UI development server
npm run dev:debug
# Run bot with debug UI simultaneously
npm run dev:debug-full
# Run bot with monitoring only (no UI)
npm run dev:monitorThe debug interface runs on http://localhost:3001 with a dark-themed interface featuring Bootstrap 5 styling, real-time WebSocket updates, and comprehensive flow analysis capabilities.
- Node.js 18+ with NPM package manager
- Discord Bot Token from Discord Developer Portal
- Google AI API Key with Genkit access
- Windows environment (optimized for Windows development)
# 1. Clone the repository
git clone https://github.com/your-username/gemini-discord-bot-rewrite.git
cd gemini-discord-bot-rewrite# 2. Install dependencies
npm install# 3. Set up environment variables
cp .env.example .env
# Edit .env with your API keys and configuration# 4. Initialize the database
npm run db:init# 5. Build the project
npm run build# 6. Start the bot
npm start# 7. For development with hot reload
npm run dev