RWKV-SpeechChat

RWKV-SpeechChat is a real-time dialogue script based on a frozen 3B RWKV model with trained adapters and initial states. The corresponding training framework is available here: https://github.com/AGENDD/RWKV-ASR, providing more detailed descriptions. Various trained weights can be applied to perform a range of audio tasks, including automatic speech recognition (ASR), speech translation, speech question answering (QA), and more.

Features

Multi Audio Task Support: Supports multiple audio tasks, including automatic speech recognition (ASR), speech translation, speech question answering (QA), and more coming soon.
Local Deployment: Can be run on a PC with a GPU that has at least 6GB of video memory.
Real-time Conversation: Supports real-time conversation with the model, similar to GPT-4.

Demonstration

/veidos are some video demonstartions of speech QA task in English and Chinese.

7e186dd6ab2c1619965c24e8440edd0c.mp4

a88f72623944f41ebb217a5d351469ff.mp4

a2e6a0bc69281b7391dc6521e4286f8b.mp4

Installation

Clone the repository:

git clone https://github.com/AGENDD/RWKV-SpeechChat.git
cd RWKV-SpeechChat

Download RWKV model weights:

Download the RWKV model weights from: https://huggingface.co/BlinkDL/rwkv-6-world/tree/main

This project currently supports "RWKV-x060-World-3B-v2.1-20240417-ctx4096.pth" only. Place the weights in the model directory.
Download trained weights:

Download the trained weights corresponding to audio tasks:
Place the weights in the model directory.

Usage

Command-line Arguments

--multiturns: Enable multi-turn conversation mode (remove this to disable multi-turn conversation).
--rwkv_path: Path to RWKV model weights (default is model/RWKV-x060-World-3B-v2.1-20240417-ctx4096.pth).
--weights_path: Path to trained weights (default is model/rwkv-adapter-speechQA-VoiceAssistant-final.pth).

Running the Script

You can run the script with the following command:

python main.py --multiturns --rwkv_path path/to/your/model/weights.pth --weights_path path/to/your/trained/weights.pth

Or use the default parameters:

python main.py

Note that multi-turn conversation currently only supports speech QA. When seeing "Inference start", press "space" to start and stop recording.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
model		model
script		script
src		src
videos		videos
README.md		README.md
README_CN.md		README_CN.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RWKV-SpeechChat

Features

Demonstration

Installation

Usage

Command-line Arguments

Running the Script

About

Uh oh!

Releases

Packages

Languages

AGENDD/RWKV-SpeechChat

Folders and files

Latest commit

History

Repository files navigation

RWKV-SpeechChat

Features

Demonstration

Installation

Usage

Command-line Arguments

Running the Script

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages