Thanks to visit codestin.com
Credit goes to github.com

Skip to content

RWKV-SpeechChat is a real-time dialogue script based on a frozen 3B RWKV model with trained adapters and initial states. Various trained weights can be applied to perform a range of audio tasks, including automatic speech recognition (ASR), speech translation, speech question answering (QA), and more.

Notifications You must be signed in to change notification settings

AGENDD/RWKV-SpeechChat

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RWKV-SpeechChat

RWKV-SpeechChat is a real-time dialogue script based on a frozen 3B RWKV model with trained adapters and initial states. The corresponding training framework is available here: https://github.com/AGENDD/RWKV-ASR, providing more detailed descriptions. Various trained weights can be applied to perform a range of audio tasks, including automatic speech recognition (ASR), speech translation, speech question answering (QA), and more.

Features

  • Multi Audio Task Support: Supports multiple audio tasks, including automatic speech recognition (ASR), speech translation, speech question answering (QA), and more coming soon.
  • Local Deployment: Can be run on a PC with a GPU that has at least 6GB of video memory.
  • Real-time Conversation: Supports real-time conversation with the model, similar to GPT-4.

Demonstration

/veidos are some video demonstartions of speech QA task in English and Chinese.

7e186dd6ab2c1619965c24e8440edd0c.mp4
a88f72623944f41ebb217a5d351469ff.mp4
a2e6a0bc69281b7391dc6521e4286f8b.mp4

Installation

  1. Clone the repository:

    git clone https://github.com/AGENDD/RWKV-SpeechChat.git
    cd RWKV-SpeechChat
  2. Download RWKV model weights:

    Download the RWKV model weights from: https://huggingface.co/BlinkDL/rwkv-6-world/tree/main

    This project currently supports "RWKV-x060-World-3B-v2.1-20240417-ctx4096.pth" only. Place the weights in the model directory.

  3. Download trained weights:

    Download the trained weights corresponding to audio tasks:

    Place the weights in the model directory.

Usage

Command-line Arguments

  • --multiturns: Enable multi-turn conversation mode (remove this to disable multi-turn conversation).
  • --rwkv_path: Path to RWKV model weights (default is model/RWKV-x060-World-3B-v2.1-20240417-ctx4096.pth).
  • --weights_path: Path to trained weights (default is model/rwkv-adapter-speechQA-VoiceAssistant-final.pth).

Running the Script

You can run the script with the following command:

python main.py --multiturns --rwkv_path path/to/your/model/weights.pth --weights_path path/to/your/trained/weights.pth

Or use the default parameters:

python main.py

Note that multi-turn conversation currently only supports speech QA. When seeing "Inference start", press "space" to start and stop recording.

About

RWKV-SpeechChat is a real-time dialogue script based on a frozen 3B RWKV model with trained adapters and initial states. Various trained weights can be applied to perform a range of audio tasks, including automatic speech recognition (ASR), speech translation, speech question answering (QA), and more.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published