NOTE: This is a very early developer preview!
An open source toolkit for building voice assistants.
Rhasspy focuses on:
- Privacy - no data leaves your computer unless you want it to
- Broad language support - more than just English
- Customization - everything can be changed
- Check out the tutorial
- Connect Rhasspy to Home Assistant
- Install the Rhasspy 3 add-on
- Run one or more satellites
- Join the community
This is a developer preview, so there are lots of things missing:
- A user friendly web UI
- An automated method for installing programs/services and downloading models
- Support for custom speech to text grammars
- Intent systems besides Home Assistant
- The ability to accumulate context within a pipeline
Rhasspy is organized by domain:
- mic - audio input
- wake - wake word detection
- asr - speech to text
- vad - voice activity detection
- intent - intent recognition from text
- handle - intent or text input handling
- tts - text to speech
- snd - audio output
Rhasspy talks to external programs using the Wyoming protocol. You can add your own programs by implementing the protocol or using an adapter.
Small scripts that live in bin/ and bridge existing programs into the Wyoming protocol.
For example, a speech to text program (asr) that accepts a WAV file and outputs text can use asr_adapter_wav2text.py
Complete voice loop from microphone input (mic) to speaker output (snd). Stages are:
- detect (optional)
- Wait until wake word is detected in mic
- transcribe
- Listen until vad detects silence, then convert audio to text
- recognize (optional)
- Recognize an intent from text
- handle
- Handle an intent or text, producing a text response
- speak
- Convert handle output text to speech, and speak through snd
Some programs take a while to load, so it's best to leave them running as a server. Use bin/server_run.py or add --server <domain> <name> when running the HTTP server.
See servers section of configuration.yaml file.
- mic
- wake
- vad
- asr
- handle
- tts
- snd
http://localhost:13331/<endpoint>
Unless overridden, the pipeline named "default" is used.
/pipeline/run- Runs a full pipeline from mic to snd
- Produces JSON
- Override
pipelineor:wake_programasr_programintent_programhandle_programtts_programsnd_program
- Skip stages with
start_afterwake- skip detection, body is detection name (text)asr- skip recording, body is transcript (text) or WAV audiointent- skip recognition, body is intent/not-recognized event (JSON)handle- skip handling, body is handle/not-handled event (JSON)tts- skip synthesis, body is WAV audio
- Stop early with
stop_afterwake- only detectionasr- detection and transcriptionintent- detection, transcription, recognitionhandle- detection, transcription, recognition, handlingtts- detection, transcription, recognition, handling, synthesis
/wake/detect- Detect wake word in WAV input
- Produces JSON
- Override
wake_programorpipeline
/asr/transcribe- Transcribe audio from WAV input
- Produces JSON
- Override
asr_programorpipeline
/intent/recognize- Recognizes intent from text body (POST) or
text(GET) - Produces JSON
- Override
intent_programorpipeline
- Recognizes intent from text body (POST) or
/handle/handle- Handles intent/text from body (POST) or
input(GET) Content-Typemust beapplication/jsonfor intent input- Override
handle_programorpipeline
- Handles intent/text from body (POST) or
/tts/synthesize- Synthesizes audio from text body (POST) or
text(GET) - Produces WAV audio
- Override
tts_programorpipeline
- Synthesizes audio from text body (POST) or
/tts/speak- Plays audio from text body (POST) or
text(GET) - Produces JSON
- Override
tts_program,snd_program, orpipeline
- Plays audio from text body (POST) or
/snd/play- Plays WAV audio via snd
- Override
snd_programorpipeline
/config- Returns JSON config
/version- Returns version info
ws://localhost:13331/<endpoint>
Audio streams are raw PCM in binary messages.
Use the rate, width, and channels parameters for sample rate (hertz), width (bytes), and channel count. By default, input audio is 16Khz 16-bit mono, and output audio is 22Khz 16-bit mono.
The client can "end" the audio stream by sending an empty binary message.
/pipeline/asr-tts- Run pipeline from asr (stream in) to tts (stream out)
- Produces JSON messages as events happen
- Override
pipelineor:asr_programvad_programhandle_programtts_program
- Use
in_rate,in_width,in_channelsfor audio input format - Use
out_rate,out_width,out_channelsfor audio output format
/wake/detect- Detect wake word from websocket audio stream
- Produces a JSON message when audio stream ends
- Override
wake_programorpipeline
/asr/transcribe- Transcribe a websocket audio stream
- Produces a JSON message when audio stream ends
- Override
asr_programorpipeline
/snd/play- Play a websocket audio stream
- Produces a JSON message when audio stream ends
- Override
snd_programorpipeline