Farzaneh Jafari, Stefano Berretti, Anup Basu
[Paper]|[Project Page]|[License]
- 6 Emotions: (Happy 😊, Sad 😢, Angry 😠, Disgust 🤢, Fear 😨, Upset 😔) + Neutral 😐
- 3 Intensity Levels: Low (⚪), Medium (🔵), High (🔴)
- 19 Unique Combinations: (Each emotion × intensity pair creates distinct expressions) + Neutral
- Automatic emotion detection from audio
- Temporal segmentation with configurable chunk sizes
- Intensity estimation (low/medium/high)
- Chunk reduction for smoother, longer segments
- Python 3.8+
- CUDA-capable GPU (recommended)
- FFmpeg (for video rendering with audio)
-
Set up the JambaTalk environment:
Follow the installation instructions from the official JambaTalk repository: JambaTalk GitHub
-
Clone this repository:
git clone https://github.com/your-repo/SEDTalker.git cd SEDTalker -
Install additional dependencies:
pip install scipy pyrender opencv-python
Download the pre-trained JambaTalk and SED models.
Extract and organize:
# Extract the downloaded models,
unzip models.zip
# Expected structure:
# SEDTalker/
# ├── EmoVOCA/
# │ ├── save/
# │ │ └── 50_model.pth # JambaTalk model trained on EmoVOCA
# │ ├── templates.pkl
# │ └── FLAME_sample.ply
# └── SED/
# └── results/
# └── emotion_diarization_7class/
# └── save/CKPT+epoch_50/
# └── model.ckpt # SED modelpython train.py \
--dataset EmoVOCA \
--lr 0.0001 \
--max_epoch 100 \
--feature_dim 512 \
--device cudaTraining Features:
- Added Emotion-conditioned generation
- Added 3 intensity levels per emotion
python test.py \
--dataset EmoVOCA \
--save_path save \
--max_epoch 50 \
--test_emotion Smile2 \
--test_intensity 3| SED Output | JambaTalk | Emoji | Description |
|---|---|---|---|
| h | happy | 😊 | Joyful, smiling |
| s | sad | 😢 | Sorrowful, downcast |
| a | angry | 😠 | Frustrated, tense |
| d | disgust | 🤢 | Repulsed, negative |
| f | fear | 😨 | Afraid, anxious |
| u | upset | 😔 | Disappointed, troubled |
| n | neutral | 😐 | Baseline, calm |
- ████ High (3) 🔴 - Maximum expression strength
- ▄▄▄▄ Medium (2) 🔵 - Moderate expression
- ▁▁▁▁ Low (1) ⚪ - Subtle expression
If you use SEDTalker in your research, please cite:
@article{sedtalker2025jafari,
title={SEDTalker: Speech-Driven 3D Facial Animation with Emotion Conditioning},
author={Farzaneh Jafari, Stefano Berretti, Anup Basu},
journal={arXiv preprint},
year={2026}
}
@misc{jambatalk2026jafari,
title={JambaTalk: Speech-driven 3D Talking Head Generation based on a Hybrid Transformer-Mamba Model},
author={Farzaneh Jafari, Stefano Berretti, Anup Basu},
note={Transactions on Multimedia Computing, Communications, and Applications},
doi={10.1145/3793196},
year={2026}
}- JambaTalk: Hybrid Transformer-Mamba architecture for facial animation
- EmoVOCA: Emotional speech dataset
This project is licensed under the MIT License - see the LICENSE file for details.
