RWKV-SpeechChat is a real-time dialogue script based on a frozen 3B RWKV model with trained adapters and initial states. The corresponding training framework is available here: https://github.com/AGENDD/RWKV-ASR, providing more detailed descriptions. Various trained weights can be applied to perform a range of audio tasks, including automatic speech recognition (ASR), speech translation, speech question answering (QA), and more.
- Multi Audio Task Support: Supports multiple audio tasks, including automatic speech recognition (ASR), speech translation, speech question answering (QA), and more coming soon.
- Local Deployment: Can be run on a PC with a GPU that has at least 6GB of video memory.
- Real-time Conversation: Supports real-time conversation with the model, similar to GPT-4.
/veidos are some video demonstartions of speech QA task in English and Chinese.
7e186dd6ab2c1619965c24e8440edd0c.mp4
a88f72623944f41ebb217a5d351469ff.mp4
a2e6a0bc69281b7391dc6521e4286f8b.mp4
-
Clone the repository:
git clone https://github.com/AGENDD/RWKV-SpeechChat.git cd RWKV-SpeechChat -
Download RWKV model weights:
Download the RWKV model weights from: https://huggingface.co/BlinkDL/rwkv-6-world/tree/main
This project currently supports "RWKV-x060-World-3B-v2.1-20240417-ctx4096.pth" only. Place the weights in the
modeldirectory. -
Download trained weights:
Download the trained weights corresponding to audio tasks:
- ASR: https://huggingface.co/JerryAGENDD/RWKV-ASR/tree/main/ASR
- ST: https://huggingface.co/JerryAGENDD/RWKV-ASR/tree/main/ST
- SpeechQA: https://huggingface.co/JerryAGENDD/RWKV-ASR/tree/main/SpeechQA
Place the weights in the
modeldirectory.
--multiturns: Enable multi-turn conversation mode (remove this to disable multi-turn conversation).--rwkv_path: Path to RWKV model weights (default ismodel/RWKV-x060-World-3B-v2.1-20240417-ctx4096.pth).--weights_path: Path to trained weights (default ismodel/rwkv-adapter-speechQA-VoiceAssistant-final.pth).
You can run the script with the following command:
python main.py --multiturns --rwkv_path path/to/your/model/weights.pth --weights_path path/to/your/trained/weights.pthOr use the default parameters:
python main.pyNote that multi-turn conversation currently only supports speech QA. When seeing "Inference start", press "space" to start and stop recording.