M³PDB

A Multi-Modal, Multi-Label, Multilingual Prompt Database
Explore the documentation of this project »

View Demo (Demo and Subjective Test) · Report Bug · Make a Suggestion

This README.md is intended for developers.

What‘s new 🔥

[2025.06] Update code , demo and dataset for M³PDB.

Getting Started Guide

Development Configuration Requirements

Due to the significant differences in the configuration environments of the various models in this study, we chose to use separate environments for each model in practice. These models interact through API calls to achieve collaboration. The configuration method for each model's environment is documented separately in its respective folder.

Installation Steps

Get a free API Key at https://chatgpt.com/
Clone the repo

git clone https://github.com/hizening/M3PDB.git

Different systems require different environments. Please refer to the readme.md of each subsystem for configuration.

File Directory Description

filetree 
├── /annotation_system/
│  ├── /Qwen2-Audio/
│  ├── /SenseVoice/
│  ├── /emotion2vec/
│  ├── /llmware/
│  ├── /readme.md/
├── /latency_aware_online_system/
│  ├── /latency_aware_online_selection.py/
│  ├── /readme.md/
├── /multi-model_prompt_registration/
│  ├── /facetts/
│  ├── /f2s.py/
│  ├── /s2s.py/
│  ├── /t2s.py/
│  ├── /readme.md/
├── /multimodal_data_preprocessing/
│  ├── /3D-Speaker/
│  ├── /speech/
│  ├── /video/
│  ├── /readme.md/
├── /unseen_language_annotation/
│  ├── /lang_prob_confirm/
│  ├── /selection/
│  ├── /readme.md/

Dataset Construction

Multimodal Data Preprocessing

1.Run the code below to achieve audio-video separation.

python multimodal_data_preprocessing/video/split_media.py

2.Run the code below to achieve speech format standardization.

python multimodal_data_preprocessing/speech/format_standardization.py

3.Run the code below to achieve video format standardization.

python multimodal_data_preprocessing/video/format_standardization.py

4.Run the code below to achieve speech enhancement.

python multimodal_data_preprocessing/speech/speech_enhancement.py

5.Run the code below to achieve video quality enhancement.

python multimodal_data_preprocessing/video/VideoSuperResolution/Train/eval.py

6.Run the code below to achieve multimodal speaker diarization and VAD.

cd multimodal_data_preprocessing/3D-Speaker/egs/3dspeaker/speaker-diarization/
bash run_audio.sh
bash run_video.sh

...... For more detailed information, please read the /multimodal_data_preprocessing/readme.md.

Annotation System

For more detailed information, please read the /annotation_system/readme.md.

Unseen Language Annotation

1.Run the code below to generate speech.

python unseen_language_annotation/lang_prob_confirm/tts/tts.py

2.Run the code below to evaluate the quality of the synthesized speech.

python dnsmos_local.py -t C:\temp\SampleClips -o sample.csv

...... For more detailed information, please read the /unseen_language_annotation/readme.md.

Dataset Usage

Multi-model Prompt Registration

1.Run the code below to match and register speech similar to the registered speech.

python /multi-model_prompt_registration/s2s.py

2.Run the code below to generate phase-based reference speech based on the registered face.

python /multi-model_prompt_registration/facetts/inference.py

3.Run the code below to match and register speech similar to the registered face.

python /multi-model_prompt_registration/f2s.py

4.Run the code below to match and register speech similar to the registered text.

python /multi-model_prompt_registration//t2s.py

...... For more detailed information, please read the /multi-model_prompt_registration/readme.md.

Latency Aware Online Selection

1.Run the code below to dynamically find the most suitable speech.

python /latency_aware_online_selection/latency_aware_online_selection.py

...... For more detailed information, please read the /latency_aware_online_selection/readme.md.

How to Contribute to the Open Source Project

Contributions make the open-source community an excellent place for learning, inspiration, and creation. Any contribution you make is greatly appreciated.

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

Version Control

This project uses Git for version control. You can check the current available version in the repository.

Contact

If you have any comment or question about M³PDB, please contact us by

email: [email protected]

License

M³PDB is released under the CC BY-NC-4.0 license.

Acknowledgements

M³PDB contains third-party components and code modified from some open-source repos, including:

datasets Emilia Dataset, voxceleb, voxpopuli
code 3D-Speaker, Side-Profile-Detection, SenseVoice, emotion2vec, seamless_communication, CosyVoice, whisper, Imaginary Voice, whisper, gpt-4o, deepface, OSUM, XTTS-v2

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
annotation_system		annotation_system
images		images
latency_aware_online_selection		latency_aware_online_selection
multi-model_prompt_registration		multi-model_prompt_registration
multimodal_data_preprocessing		multimodal_data_preprocessing
unseen_language_annotation		unseen_language_annotation
.gitattributes		.gitattributes
LICENSE		LICENSE
LICENSE.txt		LICENSE.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

M³PDB

M³PDB

What‘s new 🔥

Table of Contents

Getting Started Guide

Development Configuration Requirements

Installation Steps

File Directory Description

Dataset Construction

Multimodal Data Preprocessing

Annotation System

Unseen Language Annotation

Dataset Usage

Multi-model Prompt Registration

Latency Aware Online Selection

How to Contribute to the Open Source Project

Version Control

Contact

License

Acknowledgements

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Languages

License

Licenses found

hizening/M3PDB

Folders and files

Latest commit

History

Repository files navigation

M3PDB

M3PDB

What‘s new 🔥

Table of Contents

Getting Started Guide

Development Configuration Requirements

Installation Steps

File Directory Description

Dataset Construction

Multimodal Data Preprocessing

Annotation System

Unseen Language Annotation

Dataset Usage

Multi-model Prompt Registration

Latency Aware Online Selection

How to Contribute to the Open Source Project

Version Control

Contact

License

Acknowledgements

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

M³PDB

M³PDB

Packages