Leg2Sub

About • Features • Requirements • Installation • Usage • Troubleshooting • Contributing • License

About

Leg2Sub breaks down language barriers in education. With only 1% of Brazil's population fluent in English while quality content remains concentrated in that language, Leg2Sub provides free, powerful subtitle generation, translation, and customization.

Built with Streamlit and powered by Whisper/WhisperX, the platform transcribes, translates, and synchronizes subtitles automatically. Deploy on CPU for accessibility or GPU for high-performance processing.

Features

🎯 Video Subtitling - Automatic transcription with embedded subtitles (soft & hard)
🌐 Translation - Multi-language support via Google Translate
📝 Transcription - Whisper and WhisperX engines
🎨 Subtitle Customization - Color adjustment and formatting
🐳 Multi-Platform - Unified Docker with CPU and GPU variants
⚙️ Advanced Config - Fine-tune transcription, translation, and video parameters
🚀 Production Ready - Error handling and batch processing support

Requirements

Core

Python: 3.10 - 3.12 (strongly recommended)
Docker & Docker Compose: Any recent version
WSL 2 (Windows only): For Docker support
GPU (Optional): NVIDIA with CUDA 11.0+ support

System Resources

Resource	Minimum	Recommended
Memory	2 GB	8 GB
Disk	15 GB	30 GB
CPU	2 cores	4+ cores

Installation

Option 1: Docker (Recommended - 2 minutes)

CPU Version:

git clone https://github.com/Paulogb98/Leg2Sub.git
cd Leg2Sub

docker compose up leg2sub_cpu -d

GPU Version (NVIDIA):

# Prerequisites: Install NVIDIA Container Toolkit
# https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html

docker compose up leg2sub_gpu -d

Monitor Startup:

docker compose logs -f leg2sub_cpu
# or for GPU:
docker compose logs -f leg2sub_gpu

Access Application: Open browser to http://localhost:8501

Option 2: Local Python

git clone https://github.com/Paulogb98/Leg2Sub.git
cd Leg2Sub

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

pip install -r requirements.txt

streamlit run app/streamlit.py

Setup Guide (Windows + WSL 2 + Docker)

Step 1: Enable WSL 2

Open PowerShell as Administrator:

wsl --install -d Ubuntu

Step 2: Install Docker in WSL

Open Ubuntu terminal:

curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh

sudo usermod -aG docker $USER
newgrp docker

Step 3: Install NVIDIA Container Toolkit (GPU Only)

# Follow official guide:
# https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html

# Verify installation:
dpkg -l | grep nvidia-container-toolkit

# Test GPU support:
docker run --rm --gpus all nvidia/cuda:11.0-runtime nvidia-smi

Step 4: Clone and Run

cd /mnt/c/Users/<username>/Downloads
git clone https://github.com/Paulogb98/Leg2Sub.git
cd Leg2Sub

# CPU version
docker compose up leg2sub_cpu -d

# GPU version
docker compose up leg2sub_gpu -d

Usage

Main Features

Feature	Purpose	Input	Output
Subtitle Video	Transcribe and embed subtitles	MP4, MOV, AVI	MP4 + SRT
Translate Subtitles	Translate subtitle files	SRT, VTT	SRT (translated)
Transcribe Video	Extract text from audio	MP4, MOV, AVI	Text (displayed)
Color Subtitles	Customize subtitle colors	SRT	ASS (colored)

Quick Start

Access: http://localhost:8501
Select Feature: Choose from navigation menu
Upload File: Select media or subtitle file
Configure (Optional): Adjust parameters in Advanced Settings
Process: Click button and wait for completion
Download: Results saved to temp_output/ folder

Advanced Parameters

Transcription:

Engine: Whisper, WhisperX
Model: tiny, small, medium, large, large-v3-turbo
Device: auto, cpu, cuda
Batch Size: 1-32 (default: 12)
Compute Type: float32, float16, int8

Video Encoding:

Video Codec: h264, hevc, mpeg4
Audio Codec: aac, libopus, libmp3lame
Hardware APIs: nvenc, vaapi, amf, qsv (auto-detected)

Translation:

Languages: Portuguese, English, Spanish, French, German, Chinese, Japanese, Korean, and 20+ more

Troubleshooting

Container won't start

Check logs:

docker compose logs leg2sub_cpu
# or
docker compose logs leg2sub_gpu

Verify Docker is running:

docker ps

GPU not detected

Verify NVIDIA Container Toolkit:

docker run --rm --gpus all nvidia/cuda:11.0-runtime nvidia-smi

Expected output: GPU information should display

Out of memory error

Reduce batch size in Advanced Settings (try 4-8)
Process smaller files
Close other applications
Increase available disk space for temp files

Streamlit connection refused

Port 8501 already in use:

# Kill existing process or change port in docker-compose.yml
# Change: "8501:8501" to "8502:8501"

WSL 2 slow performance

Enable WSL 2 optimizations:

# In PowerShell as Admin:
wsl --set-default-version 2
wsl -l -v  # Verify WSL 2 is default

File permissions in WSL

# Fix permission issues:
sudo chown -R $USER:$USER /root_dir
chmod -R 755 /root_dir

Project Structure

Leg2Sub/
├── app/
│   └── template/
│       ├── homepage.py              # Main landing page
│       ├── sub_video_page.py        # Video subtitling interface
│       ├── translate_srt_page.py    # Subtitle translation interface
│       ├── transcribe_video_page.py # Video transcription interface
│       ├── sub_color_page.py        # Subtitle coloring interface
│       ├── static/
│       │   └── style.css            # Custom styling
│       └── streamlit.py             # App entry point
├── src/
│   ├── main_color_srt.py            # Color conversion logic
│   ├── main_subtitle.py             # Video processing pipeline
│   ├── main_transcriber.py          # Transcription entry point
│   └── main_translate_srt.py        # Translation logic
├── utils/
│   ├── ffmpeg_utils.py              # FFmpeg operations
│   ├── file_utils.py                # File handling
│   ├── subtitle_utils.py            # Subtitle processing
│   ├── translate_utils.py           # Translation utilities
│   ├── utils_func.py                # Helper functions
│   ├── whisper_utils.py             # Whisper integration
│   ├── whisperx_utils.py            # WhisperX integration
│   └── whisperx_transcription_utils.py # WhisperX helpers
├── assets/
│   ├── logo/                        # Logo files (SVG & PNG)
│   ├── content/                     # UI card images
│   └── demo/                        # Demo GIF
├── Dockerfile                       # Unified build with targets
├── docker-compose.yml               # Multi-service orchestration
├── requirements.txt                 # Python dependencies
└── README.md                        # This file

Docker Architecture

Dockerfile Targets

The Dockerfile uses multi-stage builds for efficient CPU and GPU deployments:

Base Stage (base):

Python 3.10-buster
Common dependencies (FFmpeg, tk, curl, etc.)
Pip packages from requirements.txt
Streamlit configuration

CPU Stage (cpu):

Inherits from base
Optimized for multi-core processing
Minimal overhead

GPU Stage (gpu):

NVIDIA CUDA 12.0.1 base
cuDNN 8 support
Python 3.10 with GPU optimization
All common dependencies

Docker Compose

Both services available simultaneously:

leg2sub_cpu:
  build:
    target: cpu      # Builds base → cpu
  ports:
    - "8501:8501"
  # Standard resource limits

leg2sub_gpu:
  build:
    target: gpu      # Builds NVIDIA base → gpu
  runtime: nvidia    # GPU support
  devices:           # GPU allocation
    - driver: nvidia

Performance

Typical Processing Times

Task	CPU (4-core)	GPU (RTX 3060)
Transcribe (30min video)	30-45 min	5-8 min
Translate (500 subtitles)	2-3 min	1-2 min
Color (SRT → ASS)	<1 sec	<1 sec
Embed Subtitles	2-5 min	1-2 min

Memory Usage

CPU Mode: 2-4 GB system RAM
GPU Mode: 2-4 GB system + 4-6 GB VRAM

Configuration Files

requirements.txt

Core dependencies:

streamlit==1.42.2 - Web interface
whisper - Speech recognition
whisperx - Enhanced transcription
deep_translator - Translation
ffmpeg_progress_yield - FFmpeg monitoring
pysrt - Subtitle handling

docker-compose.yml Highlights

# Memory limits (adjust as needed)
deploy.resources.limits.memory: 16G
deploy.resources.reservations.memory: 4G

# GPU device allocation
devices:
  - driver: nvidia
    device_ids: ['0']  # Change to use different GPU
    capabilities: [gpu, compute, utility]

# Restart policy
restart: unless-stopped  # Auto-restart on failure

Contributing

Contributions are welcome!

Fork repository
Create feature branch (git checkout -b feature/YourFeature)
Commit changes (git commit -m 'feat: add YourFeature')
Push to branch (git push origin feature/YourFeature)
Open Pull Request

Areas for Contribution

✅ New language support
✅ UI/UX improvements
✅ Performance optimization
✅ Additional subtitle formats
✅ Documentation and examples
✅ Bug fixes and testing

License

This project is licensed under the GNU GPLv3 License - see LICENSE for details.

Acknowledgments

OpenAI Whisper - Speech recognition
WhisperX - Enhanced transcription and alignment
LeGen - Subtitle processing reference
Python community - All open-source libraries

Contact & Support

📧 Email: [email protected]

🔗 LinkedIn: https://www.linkedin.com/in/paulo-goiss/

💬 GitHub Issues: Open an issue

Built with ❤️ using Python, Streamlit & FFmpeg

🔗 Repository • 📝 Issues • 📦 Releases • 👤 LinkedIn

Leg2Sub v1.0 | ✅ Production Ready | 🌐 Free & Open Source

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
app		app
assets		assets
src		src
utils		utils
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

License

Paulogb98/Leg2Sub

Folders and files

Latest commit

History

Repository files navigation

Leg2Sub

About

Features

Requirements

Core

System Resources

Installation

Option 1: Docker (Recommended - 2 minutes)

Option 2: Local Python

Setup Guide (Windows + WSL 2 + Docker)

Step 1: Enable WSL 2

Step 2: Install Docker in WSL

Step 3: Install NVIDIA Container Toolkit (GPU Only)

Step 4: Clone and Run

Usage

Main Features

Quick Start

Advanced Parameters

Troubleshooting

Container won't start

GPU not detected

Out of memory error

Streamlit connection refused

WSL 2 slow performance

File permissions in WSL

Project Structure

Docker Architecture

Dockerfile Targets

Docker Compose

Performance

Typical Processing Times

Memory Usage

Configuration Files

requirements.txt

docker-compose.yml Highlights

Contributing

Areas for Contribution

License

Acknowledgments

Contact & Support

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages