How to use

한국어 버전은 여기로 → README-ko.md

How to use

Clone this repository

git clone https://github.com/ahnhs2k/vits.git
cd vits

This repository is a fork of ouor/vits, modified to support CUDA 12.8 (cu128) and PyTorch ≥ 2.x, required for RTX 50xx (Blackwell) GPUs.

Choose cleaners

Fill "text_cleaners" in config.json
Initialy "text_cleaners" is set to 'korean_cleaners'. To use alternative cleaners, revise with following step.
Edit text/symbols.py
Remove unnecessary imports from text/cleaners.py

Create virtual environment

Windows

python -m venv .venv
.\.venv\Scripts\activate

Linux / WSL

uv venv --python 3.10
source .venv/bin/activate

Install pytorch

pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128

Install requirements

pip install -r requirements.txt

requirements.txt does NOT include PyTorch. Make sure PyTorch is installed before running this command.

If error occurs while install requirements, Install visual studio build tools and try again.

Build monotonic alignment search

cd monotonic_align
mkdir monotonic_align
python setup.py build_ext --inplace
cd ..

Windows native Python may fail to build this module. WSL2 or Linux environment is strongly recommended.

Create datasets

All wav files must match sampling_rate in config.json (Recommended: 22050Hz / mono / PCM_16)

Single speaker

"n_speakers" should be 0 in config.json

path/to/XXX.wav|transcript

Example

dataset/001.wav|こんにちは。

Mutiple speakers

Speaker id should start from 0

path/to/XXX.wav|speaker id|transcript

Example

dataset/001.wav|0|こんにちは。

Preprocess

This step is OPTIONAL.

If your text is already normalized
And "cleaned_text": true is set in config.json You can skip preprocess.py

If you need random pick from full filelist..

python random_pick.py --filelist path/to/filelist.txt

# Single speaker
python preprocess.py --text_index 1 --filelists path/to/filelist_train.txt path/to/filelist_val.txt --text_cleaners 'korean_cleaners'

# Mutiple speakers
python preprocess.py --text_index 2 --filelists path/to/filelist_train.txt path/to/filelist_val.txt --text_cleaners 'korean_cleaners'

If you have done this, set "cleaned_text" to true in config.json

Small Tips

recommand to use pretrained model (you can get pretrained model from huggingface.co)
If your vram is not enough (less than 40GB)
do not train with 44100Hz. 22050Hz is good enough.
make each dataset audio length short. (recommand to use maximum 4 seconds per audio)

Train

# Single speaker
python train.py -c <config> -m <folder>

# Mutiple speakers
python train_ms.py -c <config> -m <folder>

If you want to train from pretrained model, Place 'G_0.pth' and 'D_0.pth' in destination folder before enter train command.

Tensorboard

tensorboard --logdir checkpoints/<folder> --port 6006

Inference

Jupyter notebook

infer.ipynb

Gradio web app

python server.py --config_path path/to/config.json --model_path path/to/model.pth

Running in Docker

docker run -itd --gpus all --name "Container name" -e NVIDIA_DRIVER_CAPABILITIES=compute,utility -e NVIDIA_VISIBLE_DEVICES=all "Image name"

Differences from upstream

Updated AMP / autocast usage for torch>=2.6
Fixed mel/STFT dimension errors
Stable training on cu128 + DDP
Verified long-run convergence (70+ epochs)

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
Libtorch C++ Infer		Libtorch C++ Infer
example		example
monotonic_align		monotonic_align
text		text
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README-ko.md		README-ko.md
README.md		README.md
attentions.py		attentions.py
commons.py		commons.py
data_utils.py		data_utils.py
infer.ipynb		infer.ipynb
losses.py		losses.py
mel_processing.py		mel_processing.py
models.py		models.py
modules.py		modules.py
preprocess.py		preprocess.py
random_pick.py		random_pick.py
requirements.txt		requirements.txt
server.py		server.py
train.py		train.py
train_ms.py		train_ms.py
transforms.py		transforms.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

How to use

Clone this repository

Choose cleaners

Create virtual environment

Install pytorch

Install requirements

Build monotonic alignment search

Create datasets

Single speaker

Mutiple speakers

Preprocess

Small Tips

Train

Tensorboard

Inference

Jupyter notebook

Gradio web app

Running in Docker

Differences from upstream

About

Uh oh!

Releases

Packages

Languages

License

hobi2k/vits

Folders and files

Latest commit

History

Repository files navigation

How to use

Clone this repository

Choose cleaners

Create virtual environment

Install pytorch

Install requirements

Build monotonic alignment search

Create datasets

Single speaker

Mutiple speakers

Preprocess

Small Tips

Train

Tensorboard

Inference

Jupyter notebook

Gradio web app

Running in Docker

Differences from upstream

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages