Thanks to visit codestin.com
Credit goes to github.com

Skip to content

hyperholmes/AI-VTuber-System

 
 

Repository files navigation

🔥 Announcement 🔥

AI-VTuber-System 2 is currently in development. A brand-new user-friendly GUI interface. More comprehensive customization for your own AI VTuber. Will support local LLMs and TTS, And features rapid voice cloning functionality. The new project remains open-source. Look forward to it in 2026.

AI-VTuber-System

A graphical system program that allows you to quickly create your own AI VTuber for free. 00_程式視窗

Tutorial Video

https://www.youtube.com/watch?v=Hwss_p2Iroc

User Manual

https://docs.google.com/document/d/1na16cbaTVYin16BhvMQmeYYAZPwSyCoQ9sfcie3K-FQ/edit?usp=sharing

How to Obtain an API Key in Google AI Studio

https://drive.google.com/file/d/1WAwvtkWUyqnOu4QH-ZhlbuyONbQE4Ul1/view?usp=sharing

Installation Guide

Python >= 3.8, install the main dependencies:

pip3 install -r requirements.txt

For the specific PyTorch packages which require a special handling, use the following command:

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

Nvidia

Please update your Nvidia GPU driver to ensure support for CUDA 12.1.

Latest GPU Driver https://www.nvidia.com/Download/index.aspx

Whisper

Excerpt from https://github.com/openai/whisper

There are six model sizes, four with English-only versions, offering speed and accuracy tradeoffs. Below are the names of the available models and their approximate memory requirements and inference speed relative to the large model. The relative speeds below are measured by transcribing English speech on a A100, and the real-world speed may vary significantly depending on many factors including the language, the speaking speed, and the available hardware.

Size Parameters English-only model Multilingual model Required VRAM Relative speed
tiny 39 M tiny.en tiny ~1 GB ~10x
base 74 M base.en base ~1 GB ~7x
small 244 M small.en small ~2 GB ~4x
medium 769 M medium.en medium ~5 GB ~2x
large 1550 M N/A large ~10 GB 1x
turbo 809 M N/A turbo ~6 GB ~8x

The .en models for English-only applications tend to perform better, especially for the tiny.en and base.en models. We observed that the difference becomes less significant for the small.en and medium.en models. Additionally, the turbo model is an optimized version of large-v3 that offers faster transcription speed with a minimal degradation in accuracy.

Whisper's performance varies widely depending on the language. The figure below shows a performance breakdown of large-v3 and large-v2 models by language, using WERs (word error rates) or CER (character error rates, shown in Italic) evaluated on the Common Voice 15 and Fleurs datasets. Additional WER/CER metrics corresponding to the other models and datasets can be found in Appendix D.1, D.2, and D.4 of the paper, as well as the BLEU (Bilingual Evaluation Understudy) scores for translation in Appendix D.3.

WER breakdown by language

Whisper Model Download Links

tiny.en: https://openaipublic.azureedge.net/main/whisper/models/d3dd57d32accea0b295c96e26691aa14d8822fac7d9d27d5dc00b4ca2826dd03/tiny.en.pt

tiny: https://openaipublic.azureedge.net/main/whisper/models/65147644a518d12f04e32d6f3b26facc3f8dd46e5390956a9424a650c0ce22b9/tiny.pt

base.en: https://openaipublic.azureedge.net/main/whisper/models/25a8566e1d0c1e2231d1c762132cd20e0f96a85d16145c3a00adf5d1ac670ead/base.en.pt

base: https://openaipublic.azureedge.net/main/whisper/models/ed3a0b6b1c0edf879ad9b11b1af5a0e6ab5db9205f891f668f8b0e6c6326e34e/base.pt

small.en: https://openaipublic.azureedge.net/main/whisper/models/f953ad0fd29cacd07d5a9eda5624af0f6bcf2258be67c92b79389873d91e0872/small.en.pt

small: https://openaipublic.azureedge.net/main/whisper/models/9ecf779972d90ba49c06d968637d720dd632c55bbf19d441fb42bf17a411e794/small.pt

medium.en: https://openaipublic.azureedge.net/main/whisper/models/d7440d1dc186f76616474e0ff0b3b6b879abc9d1a4926b7adfa41db2d497ab4f/medium.en.pt

medium: https://openaipublic.azureedge.net/main/whisper/models/345ae4da62f9b3d59415adc60127b97c714f32e89e936602e85993674d08dcb1/medium.pt

large-v1: https://openaipublic.azureedge.net/main/whisper/models/e4b87e7e0bf463eb8e6956e646f1e277e901512310def2c24bf0e11bd3c28e9a/large-v1.pt

large-v2: https://openaipublic.azureedge.net/main/whisper/models/81f7c96c852ee8fc832187b0132e569d6c3065a3252ed18e56effd0b6a73e524/large-v2.pt

large-v3: https://openaipublic.azureedge.net/main/whisper/models/e5b1a55b89c1367dacf97e3e19bfd829a01529dbfdeefa8caeb59b3f1b81dadb/large-v3.pt

turbo: https://openaipublic.azureedge.net/main/whisper/models/aff26ae408abcba5fbf8813c21e62b0941638c5f6eebfb145be0c9839262a19a/large-v3-turbo.pt

Star History

Star History Chart

About

A graphical system program that allows you to quickly create your own AI VTuber for free.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 100.0%