Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Harness the power of NVIDIA technologies and LangChain to create dynamic avatars from live speech, integrating RIVA ASR and TTS with Audio2Face for real-time, expressive digital interactions.

License

Notifications You must be signed in to change notification settings

BrevikSpirit/LLMAvatarTalk-An-Interactive-AI-Assistant

 
 

Repository files navigation

LLMAvatarTalk: An Interactive AI Assistant

LLMAvatarTalk is an innovative project that combines state-of-the-art AI technologies to create an interactive virtual assistant. By integrating automatic speech recognition (ASR), large language models (LLM), LangChain, text-to-speech (TTS), audio-driven facial animation (Audio2Face), and Unreal Engine's Metahuman, LLMAvatarTalk showcases the potential of AI in achieving seamless and engaging human-computer interaction

English | 中文

Demo

Click the thumbnail below to watch the demo on YouTube: Watch the video

Features

  • Speech Recognition: Converts user speech into text in real-time using NVIDIA RIVA ASR technology.
  • Language Processing: Leverages advanced LLM (such as llama3-70b-instruct) via NVIDIA NIM APIs for deep semantic understanding and response generation.
  • Text-to-Speech: Transforms generated text responses into natural-sounding speech using NVIDIA RIVA TTS.
  • Facial Animation: Generates realistic facial expressions and animations based on audio output using Audio2Face technology.
  • Unreal Engine Integration: Enhances virtual character expressiveness by real-time linking Audio2Face with Unreal Engine's Metahuman.
  • LangChain Integration: Simplifies the integration of NVIDIA RIVA and NVIDIA NIM APIs, providing a seamless and efficient workflow for AI development.

Architecture

Prerequisites

Installation

Tested Environment: Windows 11 & Python 3.9

git clone https://github.com/yourusername/LLMAvatarTalk.git
cd LLMAvatarTalk
pip install -r requirements.txt

Execution

  1. Ensure you have set up the Riva server and configured Audio2Face and Unreal Engine.
  2. Create a .env file and input the NVIDIA NIMs API KEY. You can find a sample in .env.sample.
    NVIDIA_API_KEY=nvapi-
    
  3. Input the Riva server's IP into the URI field in config.py. The default port for Riva servers is "50051".
    URI = '192.168.1.205:50051'
    
  4. In the config.py file, you can also specify the language for the application interface and responses. The available options are 'en-US' for English and 'zh-CN' for Chinese. The default language is set to English.
    LANGUAGE = 'en-US'  # Change to 'zh-CN' for Chinese.
    
  5. Run python main.py

To-Do List

  • Optimize LLM functionality, including adding RAG and Agent
  • Improve TTS
  • Implement emotion detection and full-body animation
  • Integrate asynchronous processing
  • Integration with LangChain (RIVA's ASR and TTS), currently using a temporary alternative

Acknowledgments

Special thanks to the following projects and documentation:

RIVA:

Audio2Face:

LangChain

Projects

About

Harness the power of NVIDIA technologies and LangChain to create dynamic avatars from live speech, integrating RIVA ASR and TTS with Audio2Face for real-time, expressive digital interactions.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%