Max Headbox is an open-source voice-activated LLM Agent designed to run on a Raspberry Pi. It can be configured to execute a variety of tools and perform actions.
 Read my blog post about this project!
To get Max Headbox up and running, you'll need the following hardware:
- Raspberry Pi 5 (tested on a 16GB and 8GB model)
- A microphone is necessary for voice commands. (I've used this one from Amazon)
- GeeekPi Screen, Case, and Cooler: This all-in-one bundle from Amazon provides a screen, a protective case, and an active cooler to keep your Raspberry Pi running smoothly. (This bundle is optional but definitley use an active cooler!)
If you don't want to replicate the exact box form factor, you can still run it anywhere you want, just make sure you have about 6GB available to run the LLMs.
Ensure you have the following software installed before proceeding with the setup:
- Ruby 3.3.0
- Node 22
- Python 3
- Ollama
Follow these steps to get Max Headbox set up and ready to run.
git clone https://github.com/syxanash/maxheadbox.git
cd maxheadboxnvm use
npm installNavigate to the backend/ directory and install the required Ruby and Python packages.
cd backend/
bundle install
pip3 install -r requirements.txtAfter installing Ollama, pull the necessary language models:
ollama pull gemma3:1b
ollama pull qwen3:1.7bIn the settings select expose Ollama to the network:
sudo systemctl edit ollama.service
Enter the following conf:
[Service]
Environment="OLLAMA_HOST=0.0.0.0"
and then restart with: sudo systemctl daemon-reload && sudo systemctl restart ollama
Before starting the app, you need to configure the following variables in your .env file:
VITE_BACKEND_URL=http://192.168.0.1:4567
VITE_WEBSOCKET_URL=ws://192.168.0.1:4567
VITE_OLLAMA_URL=http://192.168.0.1:11434The first two variables use the same address since the WebSocket app also runs on Sinatra. If your Ollama instance is running on a different device, you'll need to specify its network address.
By default the recording directory is /dev/shm/whisper_recordings if you're developing and running the project on a different OS you can change this in your env file e.g.
RECORDINGS_DIR="~/Desktop/whisper_recordings"
To start the Max Headbox agent, run the following command from the root of the project directory:
npm run start-prodYou should now be able to see the app running on localhost. For development instead run:
npm run start-devCreating tools is as simple as making a JavaScript module in src/tools/ that exports an object with four properties: the tool's name, the parameters passed to the function, a describe field, and the function's main execution body.
Some frontend tools may require backend API handlers to fetch information from the Pi hardware (since the frontend cannot query it directly) and expose it via REST. I created a folder in backend/notions/ where I placed all these Ruby Sinatra routes.
Take a look at what's already there to have an idea.
The tools with the .txt extension are provided for reference. If you want to import them into the agent, just rename the extension to .js or .rb for the backend ones.
If you consider certain tools to be dangerous and want additional confirmation before the agent executes them, you can set the property dangerous: true when creating a new tool. When the model selects this tool, it will ask for your confirmation before executing it. Simply reply with YES or NO. Checkout the demo video with the light bulb to see how this confirmation flow works!
- The voice activation was achieved using Vosk.
- faster-whisper: Used for efficient and accurate voice transcription. For a detailed guide on setting it up locally, check out this this tutorial!
- The animated character in the UI was created by slightly modifying Microsoft's beautiful Fluent Emoji set.
Yes, I know, I should've made the whole backend layer in Python. It would've made more sense, but I didn't feel comfortable writing in Python since it's not my primary language, and I didn't want to just vibecode it.
I'm aware of Ollama's shady practices and the issues with llama.cpp's creator. Eventually, I will migrate, but for now it served its purpose for rapid prototyping my project. I've read it's even more performant, so yes, I'll definitely migrate (maybe).
I wanted the web app to be the most important part of the project, containing the logic of the actual Agent. I thought of using the Ruby+Python backend layer only for interacting with the Raspberry Pi hardware, it could easily be rewritten in a different stack and reconnected to the frontend if needed. Check the architecture diagram here.
Yes for sure, but after extensive testing, I noticed that the performance impact isn't very significant. To be completely honest, at most it might save a few seconds before the LLM completes its job. I'd rather have a nice UI feedback showing that something is happening, rather than a black screen while the LLM is processing (a small tradeoff). Happy to be proven wrong tho!
Great idea. When I have time, I'll definitely look into it. For now, I just wanted to make the wake-word system work, and that's it.
Fantastic question, thanks for asking! Check out my blog post to see why I went with redefining a function payload for invoking tools instead of using the tools' APIs directly.
No, if the quality of the code is shite, it's entirely my doing, completely organic, don't worry.
Jokes aside, the only tools I've created using Copilot are weather.rb and wiki.rb, because I wanted something quick to test my Agent.
Dinner is ready. For any more questions, my assistant will take it from here alternatively open a GitHub issue. Have a good night!