🔥 Announcement 🔥

AI-VTuber-System 2 is currently in development. A brand-new user-friendly GUI interface. More comprehensive customization for your own AI VTuber. Will support local LLMs and TTS, And features rapid voice cloning functionality. The new project remains open-source. Look forward to it in 2026.

AI-VTuber-System

A graphical system program that allows you to quickly create your own AI VTuber for free.

Tutorial Video

https://www.youtube.com/watch?v=Hwss_p2Iroc

User Manual

https://docs.google.com/document/d/1na16cbaTVYin16BhvMQmeYYAZPwSyCoQ9sfcie3K-FQ/edit?usp=sharing

How to Obtain an API Key in Google AI Studio

https://drive.google.com/file/d/1WAwvtkWUyqnOu4QH-ZhlbuyONbQE4Ul1/view?usp=sharing

Installation Guide

Python >= 3.8, install the main dependencies:

pip3 install -r requirements.txt

For the specific PyTorch packages which require a special handling, use the following command:

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

Nvidia

Please update your Nvidia GPU driver to ensure support for CUDA 12.1.

Latest GPU Driver https://www.nvidia.com/Download/index.aspx

Whisper

Excerpt from https://github.com/openai/whisper

There are six model sizes, four with English-only versions, offering speed and accuracy tradeoffs. Below are the names of the available models and their approximate memory requirements and inference speed relative to the large model. The relative speeds below are measured by transcribing English speech on a A100, and the real-world speed may vary significantly depending on many factors including the language, the speaking speed, and the available hardware.

Size	Parameters	English-only model	Multilingual model	Required VRAM	Relative speed
tiny	39 M	`tiny.en`	`tiny`	~1 GB	~10x
base	74 M	`base.en`	`base`	~1 GB	~7x
small	244 M	`small.en`	`small`	~2 GB	~4x
medium	769 M	`medium.en`	`medium`	~5 GB	~2x
large	1550 M	N/A	`large`	~10 GB	1x
turbo	809 M	N/A	`turbo`	~6 GB	~8x

The .en models for English-only applications tend to perform better, especially for the tiny.en and base.en models. We observed that the difference becomes less significant for the small.en and medium.en models. Additionally, the turbo model is an optimized version of large-v3 that offers faster transcription speed with a minimal degradation in accuracy.

Whisper's performance varies widely depending on the language. The figure below shows a performance breakdown of large-v3 and large-v2 models by language, using WERs (word error rates) or CER (character error rates, shown in Italic) evaluated on the Common Voice 15 and Fleurs datasets. Additional WER/CER metrics corresponding to the other models and datasets can be found in Appendix D.1, D.2, and D.4 of the paper, as well as the BLEU (Bilingual Evaluation Understudy) scores for translation in Appendix D.3.

Whisper Model Download Links

tiny.en: https://openaipublic.azureedge.net/main/whisper/models/d3dd57d32accea0b295c96e26691aa14d8822fac7d9d27d5dc00b4ca2826dd03/tiny.en.pt

tiny: https://openaipublic.azureedge.net/main/whisper/models/65147644a518d12f04e32d6f3b26facc3f8dd46e5390956a9424a650c0ce22b9/tiny.pt

base.en: https://openaipublic.azureedge.net/main/whisper/models/25a8566e1d0c1e2231d1c762132cd20e0f96a85d16145c3a00adf5d1ac670ead/base.en.pt

base: https://openaipublic.azureedge.net/main/whisper/models/ed3a0b6b1c0edf879ad9b11b1af5a0e6ab5db9205f891f668f8b0e6c6326e34e/base.pt

small.en: https://openaipublic.azureedge.net/main/whisper/models/f953ad0fd29cacd07d5a9eda5624af0f6bcf2258be67c92b79389873d91e0872/small.en.pt

small: https://openaipublic.azureedge.net/main/whisper/models/9ecf779972d90ba49c06d968637d720dd632c55bbf19d441fb42bf17a411e794/small.pt

medium.en: https://openaipublic.azureedge.net/main/whisper/models/d7440d1dc186f76616474e0ff0b3b6b879abc9d1a4926b7adfa41db2d497ab4f/medium.en.pt

medium: https://openaipublic.azureedge.net/main/whisper/models/345ae4da62f9b3d59415adc60127b97c714f32e89e936602e85993674d08dcb1/medium.pt

large-v1: https://openaipublic.azureedge.net/main/whisper/models/e4b87e7e0bf463eb8e6956e646f1e277e901512310def2c24bf0e11bd3c28e9a/large-v1.pt

large-v2: https://openaipublic.azureedge.net/main/whisper/models/81f7c96c852ee8fc832187b0132e569d6c3065a3252ed18e56effd0b6a73e524/large-v2.pt

large-v3: https://openaipublic.azureedge.net/main/whisper/models/e5b1a55b89c1367dacf97e3e19bfd829a01529dbfdeefa8caeb59b3f1b81dadb/large-v3.pt

turbo: https://openaipublic.azureedge.net/main/whisper/models/aff26ae408abcba5fbf8813c21e62b0941638c5f6eebfb145be0c9839262a19a/large-v3-turbo.pt

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
AIVT_Character/Midokyoko_Kyurei		AIVT_Character/Midokyoko_Kyurei
GUI_control_panel		GUI_control_panel
Google/gemini		Google/gemini
Live_Chat		Live_Chat
My_Tools		My_Tools
OBS_websocket		OBS_websocket
OpenAI		OpenAI
Sentiment_Analysis		Sentiment_Analysis
TextToSpeech		TextToSpeech
VTubeStudioPlugin		VTubeStudioPlugin
.gitattributes		.gitattributes
AIVT_Config.py		AIVT_Config.py
AI_Vtuber_GUI.py		AI_Vtuber_GUI.py
AI_Vtuber_UI.py		AI_Vtuber_UI.py
LICENSE.txt		LICENSE.txt
Mic_Record.py		Mic_Record.py
Play_Audio.py		Play_Audio.py
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🔥 Announcement 🔥

AI-VTuber-System

Tutorial Video

User Manual

How to Obtain an API Key in Google AI Studio

Installation Guide

Nvidia

Whisper

Whisper Model Download Links

Star History

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

hyperholmes/AI-VTuber-System

Folders and files

Latest commit

History

Repository files navigation

🔥 Announcement 🔥

AI-VTuber-System

Tutorial Video

User Manual

How to Obtain an API Key in Google AI Studio

Installation Guide

Nvidia

Whisper

Whisper Model Download Links

Star History

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages