PIA: Deepfake Detection Using Phoneme-Temporal and Identity-Dynamic Analysis

Soumyya Kanti Datta, Tanvi Ranga, Chengzhe Sun, Siwei Lyu

Accepted by APAI ICCV 2025

Paper

Abstract

The rise of manipulated media has made deepfakes a particularly insidious threat, involving various generative manipulations such as lip-sync modifications, face-swaps, and avatar-driven facial synthesis. Conventional detection methods, which predominantly depend on manually designed phoneme–viseme alignment thresholds, fundamental frame-level consistency checks, or a unimodal detection strategy, inadequately identify modern-day deepfakes generated by advanced generative models such as GANs, diffusion models, and neural rendering techniques. These advanced techniques generate nearly perfect individual frames yet inadvertently create minor temporal discrepancies frequently overlooked by traditional detectors. We present a novel multimodal audio-visual framework, Phoneme-Temporal and Identity-Dynamic Analysis(PIA), incorporating language, dynamic face motion, and facial identification cues to address these limitations. We utilize phoneme sequences, lip geometry data, and advanced facial identity embeddings. This integrated method significantly improves the detection of subtle deepfake alterations by identifying inconsistencies across multiple complementary modalities.

Prerequisites

Python 3.12
Install necessary packages using pip install -r requirements.txt.
apt-get update -y && apt-get install -y espeak-ng
Model weights can be found under ./checkpoints folder
The input video should have the face of only 1 subject in the entire video.
The input video should have 1 face per frame.

Inference:

The input_video should be in mp4 format :

python main.py --video {input_video_path} --outdir {output_path}

Citation

@misc{datta2025piadeepfakedetectionusing,
      title={PIA: Deepfake Detection Using Phoneme-Temporal and Identity-Dynamic Analysis}, 
      author={Soumyya Kanti Datta and Tanvi Ranga and Chengzhe Sun and Siwei Lyu},
      year={2025},
      eprint={2510.14241},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2510.14241}, 
}

Acknowledgements

This work is supported by the Center for Identification Technology Research (CITeR) and the National Science Foundation under Grant No. 1822190

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
Images		Images
checkpoints		checkpoints
src		src
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PIA: Deepfake Detection Using Phoneme-Temporal and Identity-Dynamic Analysis

Paper

Abstract

Prerequisites

Inference:

Citation

Acknowledgements

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

skrantidatta/PIA

Folders and files

Latest commit

History

Repository files navigation

PIA: Deepfake Detection Using Phoneme-Temporal and Identity-Dynamic Analysis

Paper

Abstract

Prerequisites

Inference:

Citation

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages