End-to-end open-source voice agents platform: Quickly build LLM based voice driven conversational applications
Bolna is the end-to-end open source production ready framework for quickly building LLM based voice driven conversational applications.
demo-create-agent-and-make-calls.mp4
Bolna helps you create AI Voice Agents which can be instructed to do tasks beginning with:
- Initiating a phone call using telephony providers like
Twilio, etc. - Transcribing the conversations using
Deepgram, etc. - Using LLMs like
OpenAI, etc to handle conversations - Synthesizing LLM responses back to telephony using
AWS Polly,XTTS, etc. - Instructing the Agent to perform tasks like sending emails, text messages, booking calendar after the conversation has ended
Refer to the docs for a deepdive into all supported providers.
This repo contains the following types of agents in the agents/agent_types directory which can be used to create conversational applications:
contextual_conversational_agent: LLM-based free flow agentgraph_based_conversational_agent: LLM-based agent with classificationextraction_agent: Currently WIP. Feel free to contribute and open a PR
A basic local setup uses Twilio for telephony. We have dockerized the setup in local_setup/. One will need to populate an environment .env file from .env.sample.
The setup consists of four containers:
- Twilio web server: for initiating the calls one will need to set up a Twilio account
- Bolna server: for creating and handling agents
ngrok: for tunneling. One will need to add theauthtokentongrok-config.ymlredis: for persisting agents & users contextual data
Running docker-compose up --build will use the .env as the environment file and the agent_data to start all containers.
Once the docker containers are up, you can now start to create your agents and instruct them to initiate calls.
The repo contains examples as a reference for creating for application agents in the agent_data directory:
airbnb_job: Astreamingconversationagent where the agent screens potential candidates for a job at AirBnBsorting_hat: Apreprocessedconversationagent which acts as a Sorting Hat for Hogwartsyc_screening: Astreamingconversationagent which acts as a Y Combinator partner asking questions around the idea/startupindian_elections_vernacular: Astreamingconversationagent which asks people for their outlook towards Indian elections in Hindi languagesample_agent: A boilerplate sample agent to start building your own agent!
All agents are read from the agent_data directory. We have provided some samples for getting started. There's a dashboard coming up [still in WIP] which will easily facilitate towards creating agents.
General structure of the agents:
your-awesome-agent-name
├── conversation_details.json # Compiled prompt
└── users.json # List of users that the call would be made to
| Agent type | streaming agent |
preprocessed agent |
|---|---|---|
| Introduction | A streaming agent will work like a free-flow conversation following the prompt | Apart from following the prompt, a preprocessed agent will have all responses from the agent preprocessed in the form of audio which will be streamed as per the classification of human's response |
| Prompt | Required (defined in conversation_details.json) |
Required (defined in conversation_details.json) |
| Preprocessing | Not required | Required (using scripts/preprocessed.py) |
Note
Currently, the users.json has the following user attributes which gets substituted in the prompt to make it customized for the call. More to be added soon!
- first_name
- last_name
- honorific
For instance, in the case of a preprocessed agent, the initial intro could be customized to have the user's name.
Even the prompt could be customized to fill in user contextual details from users.json. For example, {first_name} defined in prompt and prompt intro
- Create a directory under
agent_datadirectory with the name for your agent - Create your prompt and save in a file called
conversation_details.jsonusing the example provided - Optional: In case if you are creating a
preprocessedagent, generate the audio data used by using the scriptscripts/preprocess.py
- At this point, the docker containers should be up and running
- Your agent prompt should be defined in the
agent_data/directory withconversation_details.jsonwith the user list inusers.json - Create your agent using the Bolna Create Agent API. An agent will get created with an
agent_id - Instruct the agent to initiate call to users via
scripts/initiate_agent_call.py <agent_name> <agent_id>
Though the repository is completely open source, you can connect with us if interested in managed offerings or more customized solutions.
We love all types of contributions: whether big or small helping in improving this community resource.
- There are a number of open issues present which can be good ones to start with
- If you have suggestions for enhancements, wish to contribute a simple fix such as correcting a typo, or want to address an apparent bug, please feel free to initiate a new issue or submit a pull request
- If you're contemplating a larger change or addition to this repository, be it in terms of its structure or the features, kindly begin by creating a new issue open a new issue
and outline your proposed changes. This will allow us to engage in a discussion before you dedicate a significant amount of time or effort. Your cooperation and understanding are appreciated