This is a PoC demonstrating how two bots can autonomously "speak" to each other using an LLM and TTS. It uses NATS jetstream for message routing, ollama for generating text using an LLM of the user's choice and playht API for TTS speech synthesis.
Read the blog post which motivated this project https://cybernetist.com/2024/04/25/go-or-rust-just-listen-to-the-bots/
Important
This project was built purely for educational purposes and thus is likely ridden with bugs, inefficiencies, etc. You should consider this project as highly experimental.
Click to watch/listen to a sample conversation:
sequenceDiagram
participant GoTTS as TTS
participant GoLLM as LLM
participant Gobot
participant Rustbot
participant RustLLM as LLM
participant RustTTS as TTS
Gobot->>+Rustbot: Hi Rustbot!
Rustbot->>RustLLM: Hi Rustbot!
RustLLM->>RustTTS: Hi Gobot!
RustLLM->>Rustbot: Hi Gobot!
Rustbot->>-Gobot: Hi Gobot!
activate Gobot
Gobot->>GoLLM: Hi Gobot!
GoLLM->>GoTTS: Teach me about Rust!
GoLLM->>Gobot: Teach me about Rust!
Gobot->>-Rustbot: Teach me about Rust!
Zoomed in view on the high-level architecture:
flowchart TB
subgraph " "
playht(PlayHT API)
ollama(Ollama)
end
bot <-->ollama
bot <-->playht
bot <--> NATS[[NATS JetStream]]
Note
Mermaid does not have proper support for controlling layout or even basic graph legends There are some terrible workarounds, so I've opted not to use them in this README, hence the diagram might feel a bit unwieldy
flowchart TB
ollama{Ollama}
playht{PlayHT}
llm((llm))
tts((tts))
jetWriter((jetWriter))
jetReader((jetReader))
ttsChunks(ttsChunks)
jetChunks(jetChunks)
prompts(prompts)
ttsDone(ttsDone)
subgraph NATS JetStream
Go(go)
Rust(rust)
end
Go-- 1. -->jetReader
jetWriter-- 7. -->Rust
jetReader-- 2. -->prompts
prompts-- 3. -->llm
llm-->ollama
llm-- 4. -->ttsChunks
llm-- 4. -->jetChunks
jetChunks-->jetWriter
ttsChunks-->tts
tts-- 5. -->playht
tts-- 6. -->ttsDone
ttsDone-->jetWriter
jet.Readerreceives a message published on a JetStream subjectjet.Readersends this message to thepromptschannelllmworker reads the messages sent to thepromptschannel and forwards them to ollama for LLM generation- ollama generates the response and the
llmworker sends it to bothttsChunksandjetChunkschannels ttsworker reads the message and sends the message to PlayHT API and streams the audio to the default audio device;- once the playback has finished
ttsworker notifiesjet.Writervia thettsDonechannel that it's done playing audio jet.Writerreceives the notification on thettsDonechannel and publishes the message it received onjetChunkschannel to a JetStream subject
There are a few prerequisites:
Both bots use nats as their communication channel.
Install
brew tap nats-io/nats-tools
brew install nats nats-serverRun:
nats-server -jsnix-shell -p nats-server natscli
nats-server -jsDownload it from the official site or see the Nix install below.
nix-shell -p ollama
Run a model you decide to use
ollama run llama2If you are running on Linux you need to install the following libraries -- assuming you want to play with the bot-speaking service
Note
This is for Ubuntu Linux, other distros have likely different package names
sudo apt install -y --no-install-recommends libasound2-dev pkg-configOnce you've created an account on playht you need to generate API keys. See here for more details.
Now, you need to export them via the following environment variables which are read by the client libraries we use (go-playht, playht_rs):
export PLAYHT_SECRET_KEY=XXXX
export PLAYHT_USER_ID=XXXImportant
Once you've started gobot you need to prompt it.
gobot reads prompt from stdin which kickstarts the conversation:
rusbot waits for gobot before it responds!
Start the gobot:
go run ./gobot/...Start the rustbot:
cargo run --manifest-path rustbot/Cargo.toml