Build a Multi-Agent Architecture
Build a Multi-Agent Architecture
Create a multi-agent architecture using Deepgram’s Voice Agent API, where specialized agents handle different phases of customer interactions through seamless handoffs.
Build a Multi-Agent Architecture
Create a multi-agent architecture using Deepgram’s Voice Agent API, where specialized agents handle different phases of customer interactions through seamless handoffs.
For more information and to use our reference implementation, visit the Deepgram Multi-Agent repo.
Traditional single-agent voice systems face fundamental limitations as complexity grows. This implementation demonstrates how to overcome these challenges by treating the conversation as a sequence of specialized phases/states, where each phase has its own:
This approach solves critical problems:
Key Architecture Points:
This implementation creates separate Voice Agent sessions for each specialized agent. The CallOrchestrator (orchestrator/call_orchestrator.py) manages these transitions while maintaining the Twilio connection.
A critical implementation detail: the audio forwarding task starts once and persists throughout all agent transitions. This ensures the Twilio WebSocket remains active while agents switch:
Voice Agent connections are async context managers that require manual lifecycle management:
Functions must be called with proper timing to maintain conversation flow:
Each agent represents a distinct conversation phase with focused responsibilities:
agents/qualifier/config.py)handoff_to_next_agent, end_conversationagents/advisor/config.py)handoff_to_next_agent, end_conversationagents/closer/config.py)schedule_followup, record_satisfaction, end_conversationWhen an agent calls handoff_to_next_agent, the orchestrator:
Each agent is configured with settings for STT, LLM, TTS, and functions:
Functions include detailed descriptions to guide the LLM’s behavior. See agents/shared/functions.py for examples. For comprehensive function definition best practices, refer to docs/FUNCTION_GUIDE.md.
Key pattern: Functions should wait for customer confirmation:
This implementation uses the Deepgram Python SDK:
Copy .env.example to .env and add your API keys:
The key environment variables (config.py manages these):
DEEPGRAM_API_KEY - Your Deepgram API keyGROQ_API_KEY - Groq API key for conversation summarizationTWILIO_ACCOUNT_SID, TWILIO_AUTH_TOKEN, TWILIO_PHONE_NUMBER - Twilio credentialsLEAD_SERVER_EXTERNAL_URL - Your public tunnel URLLEAD_PHONE_NUMBER - Phone number to callCopy the public URL and update LEAD_SERVER_EXTERNAL_URL in your .env file.
The system will:
LEAD_PHONE_NUMBERHere’s a simplified flow showing agent transitions and function calls:
Edit the prompts in each agent’s config file:
Qualifier (agents/qualifier/config.py):
Advisor (agents/advisor/config.py):
Closer (agents/closer/config.py):
Important: See docs/PROMPT_GUIDE.md for voice-specific prompt engineering best practices.
Quick overview:
agents/your_agent/config.py with agent configurationorchestrator/call_orchestrator.py transition logicEdit utils/context_summarizer.py to: