🎭 DeepFake Detection using Gait Analysis

A comprehensive machine learning pipeline that detects deepfake videos by analyzing human gait patterns using MediaPipe pose estimation and advanced ML algorithms.

📋 Table of Contents

🎭 DeepFake Detection using Gait Analysis

🎯 Overview

This project implements a novel approach to deepfake detection by analyzing human gait patterns extracted from videos. Unlike traditional facial analysis methods, gait analysis is more robust against sophisticated deepfake generation techniques that primarily focus on facial features.

How It Works:

Pose Extraction: Uses MediaPipe to extract 33 body landmarks from video frames
Gait Analysis: Processes keypoints to extract meaningful gait features (coordinates + joint angles)
Machine Learning: Trains multiple models (Traditional ML + Deep Learning) to classify authentic vs deepfake videos
Detection: Analyzes new videos to determine authenticity with confidence scores

🧬 Features

✅ Multiple Model Support: RandomForest, SVM, Logistic Regression, KNN, Gradient Boosting, LSTM, CNN, Hybrid models
✅ Advanced Preprocessing: Normalization, feature engineering, sequence resampling
✅ Hyperparameter Tuning: Grid Search and Random Search optimization
✅ Comprehensive Evaluation: Confusion matrices, ROC curves, precision-recall analysis
✅ Real-time Detection: Process MP4 videos with confidence scoring
✅ Batch Processing: Analyze multiple videos simultaneously
✅ Visualization Tools: Gait pattern visualization and model performance charts
✅ Robust Pipeline: Error handling, logging, and reproducible results

🏗️ Project Structure

DeepFake-Detection/
├── 📁 data/
│   ├── 🎥 *.mp4                     # Input video files
│   ├── 📄 gait_keypoints.csv        # Extracted keypoints
│   └── 📁 processed/
│       ├── 🧮 X.npy                 # Feature matrix
│       ├── 🏷️ y.npy                  # Labels
│       └── 📋 labels.json           # Label mapping
├── 📁 src/
│   ├── 📁 preprocessing/
│   │   ├── 🔧 extract_gait.py       # Extract gait from videos
│   │   └── 🔄 preprocess_gait.py    # Feature engineering
│   ├── 📁 models/
│   │   ├── 🎯 train_baseline.py     # Train ML models
│   │   ├── 🔬 hyperparameter_tuning.py  # Optimize models
│   │   └── 🧠 deep_learning_models.py   # Neural networks
│   └── 📁 utils/
│       └── 📊 visualize_gait.py     # Visualization tools
├── 📁 models/                       # Saved trained models
├── 📁 evaluation_results/           # Model evaluation outputs
├── 🚀 detect_deepfake.py           # Main detection script
├── 📈 evaluation_and_visualization.py  # Comprehensive evaluation
├── 🎨 visualize_menu.py            # Visualization interface
├── 📋 requirements.txt             # Dependencies
└── 📚 README.md                    # This file

⚙️ Installation

Prerequisites

Python 3.8+
Webcam or MP4 video files
At least 4GB RAM (8GB recommended)
CUDA-compatible GPU (optional, for deep learning)

1. Clone Repository

git clone https://github.com/your-username/DeepFake-Detection.git
cd DeepFake-Detection

2. Create Virtual Environment

# Windows
python -m venv venv
venv\Scripts\activate

# Linux/Mac
python3 -m venv venv
source venv/bin/activate

3. Install Dependencies

pip install -r requirements.txt

4. Verify Installation

python -c "import cv2, mediapipe, sklearn, numpy, pandas; print('✅ All dependencies installed successfully!')"

🚀 Quick Start

30-Second Demo

Add your videos to the data/ folder (MP4 format)

Extract gait patterns:

python src/preprocessing/extract_gait.py

Preprocess data:

python src/preprocessing/preprocess_gait.py

Train models:
```
python src/models/train_baseline.py
```

Detect deepfakes:

python detect_deepfake.py data/your_video.mp4

That's it! 🎉 You'll get a prediction with confidence score.

📖 Detailed Usage Guide

1. Data Preprocessing

Extract Gait from Videos

# Extract from all MP4 files in data folder
python src/preprocessing/extract_gait.py

# Output: data/gait_keypoints.csv with pose landmarks

What it does:

Processes all .mp4 files in the data/ directory
Extracts 33 MediaPipe pose landmarks per frame
Saves combined results with video identification
Reports detection statistics

Feature Engineering

# Convert raw keypoints to ML-ready features
python src/preprocessing/preprocess_gait.py

# Outputs:
# - data/processed/X.npy (feature matrix)
# - data/processed/y.npy (labels)
# - data/processed/labels.json (label mapping)

What it does:

Normalizes gait sequences using torso length
Computes joint angles (hip-knee-ankle, elbow angles)
Resamples sequences to uniform length (64 frames)
Creates synthetic deepfake data for training
Generates final feature matrix (70 features per frame)

2. Gait Visualization

# Interactive visualization menu
python visualize_menu.py

# Available options:
# 1. Static pose plots
# 2. Animated gait sequences
# 3. Joint angle analysis
# 4. Feature distribution plots

Visualization Features:

📊 Pose landmark scatter plots
🎬 Animated walking sequences
📈 Joint angle time series
🎯 Feature distribution analysis
👥 Person-wise comparisons

3. Model Training

Train Baseline Models

# Train multiple ML algorithms
python src/models/train_baseline.py

# Models trained:
# - RandomForest
# - SVM
# - Logistic Regression
# - K-Nearest Neighbors
# - Gradient Boosting

Output:

Trained models saved in models/ directory
Performance metrics and comparison
Best model automatically selected
Cross-validation results

Training Configuration

# Customize training in train_baseline.py
MODELS_CONFIG = {
    'test_size': 0.2,        # 20% data for testing
    'cv_folds': 5,           # 5-fold cross-validation
    'random_state': 42,      # Reproducible results
    'n_jobs': -1            # Use all CPU cores
}

4. Hyperparameter Tuning

# Optimize model parameters
python src/models/hyperparameter_tuning.py

# Methods:
# 1. Grid Search (thorough)
# 2. Random Search (faster)
# 3. Model comparison
# 4. Best model selection

Tuning Process:

Grid Search: Exhaustive parameter exploration
Random Search: Efficient parameter sampling
Cross-validation: Robust performance estimation
Model Selection: Best performing configuration

5. Deep Learning Models

# Train neural networks (requires TensorFlow)
python src/models/deep_learning_models.py

# Architectures:
# - LSTM (temporal patterns)
# - CNN (spatial features)
# - Hybrid (CNN + LSTM)

Deep Learning Features:

🧠 LSTM: Captures temporal gait dynamics
🖼️ CNN: Extracts spatial pose patterns
🔄 Hybrid: Combines spatial and temporal analysis
📊 Advanced Metrics: Precision, recall, F1-score, AUC

6. Model Evaluation

# Comprehensive model analysis
python evaluation_and_visualization.py

# Generates:
# - Confusion matrices
# - ROC curves
# - Precision-recall curves
# - Model comparison charts
# - Detailed performance report

Evaluation Outputs:

📊 evaluation_results/model_comparison.png
📈 evaluation_results/combined_roc_curves.png
📋 evaluation_results/evaluation_report.txt
🎯 Individual model performance plots

7. Deepfake Detection

Single Video Analysis

# Analyze one video
python detect_deepfake.py path/to/video.mp4

# With custom threshold
python detect_deepfake.py video.mp4 --threshold 0.7

# Save results to JSON
python detect_deepfake.py video.mp4 --output results.json

Batch Processing

# Analyze all videos in directory
python detect_deepfake.py data/ --batch

# Batch with output file
python detect_deepfake.py data/ --batch --output batch_results.json

Detection Output:

{
  "video_path": "data/test_video.mp4",
  "prediction": "Deepfake",
  "deepfake_probability": 0.824,
  "confidence": 0.824,
  "threshold": 0.5,
  "model_type": "RandomForest",
  "frames_analyzed": 127
}

🎥 Testing with Your Videos

Supported Formats

✅ MP4 (recommended)
✅ AVI
✅ MOV
✅ WEBM

Video Requirements

Resolution: 480p or higher recommended
Duration: 2-30 seconds optimal
Content: Clear view of person walking/moving
Quality: Good lighting, minimal occlusion

Test Video Preparation

# 1. Add videos to data folder
cp your_videos/*.mp4 data/

# 2. Verify video format
python -c "import cv2; cap = cv2.VideoCapture('data/test.mp4'); print(f'Video OK: {cap.isOpened()}')"

# 3. Quick detection test
python detect_deepfake.py data/test.mp4

Expected Results

Authentic Video: prediction: "Authentic", confidence > 0.6
Deepfake Video: prediction: "Deepfake", confidence > 0.6
Low Confidence: May indicate poor video quality or edge cases

📊 Model Performance

Baseline Performance (Example)

Model	Accuracy	Precision	Recall	F1-Score	AUC
RandomForest	0.892	0.885	0.891	0.888	0.924
SVM	0.876	0.872	0.879	0.875	0.918
Logistic Regression	0.834	0.829	0.831	0.830	0.902
KNN	0.823	0.818	0.825	0.821	0.889
Gradient Boosting	0.887	0.883	0.884	0.883	0.921

Deep Learning Performance

Model	Accuracy	Precision	Recall	F1-Score	Parameters
LSTM	0.914	0.908	0.912	0.910	124K
CNN	0.901	0.896	0.903	0.899	89K
Hybrid	0.923	0.919	0.921	0.920	156K

Performance Factors

Data Quality: Clean pose detection improves accuracy
Video Length: 2-10 seconds optimal for gait analysis
Movement Type: Walking/running works best
Background: Minimal interference preferred

🛠️ Troubleshooting

Common Issues

1. "No module named 'tensorflow'"

# Install TensorFlow
pip install tensorflow

# For GPU support
pip install tensorflow-gpu

2. "MediaPipe pose detection failed"

# Check video file
python -c "import cv2; print(cv2.VideoCapture('your_video.mp4').isOpened())"

# Verify MediaPipe
python -c "import mediapipe as mp; print('MediaPipe OK')"

3. "No pose landmarks detected"

Ensure person is visible in video
Check lighting conditions
Verify video quality and resolution
Try different video or adjust detection confidence

4. "Insufficient training data"

# Check data files
ls data/processed/
# Should contain: X.npy, y.npy, labels.json

# Regenerate if missing
python src/preprocessing/preprocess_gait.py

5. "Model not found"

# Train models first
python src/models/train_baseline.py

# Check models directory
ls models/

Performance Optimization

Speed Up Training

# Reduce cross-validation folds
cv_folds = 3  # instead of 5

# Use fewer hyperparameters
n_iter = 20  # for random search

# Limit model complexity
max_depth = 10  # for tree-based models

Memory Optimization

# Process videos in smaller batches
batch_size = 16  # for deep learning

# Reduce sequence length
target_frames = 32  # instead of 64

# Use feature selection
from sklearn.feature_selection import SelectKBest

Debug Mode

# Enable verbose logging
export PYTHONPATH="."
python -v detect_deepfake.py video.mp4

# Check intermediate outputs
python src/preprocessing/extract_gait.py --debug

# Validate data integrity
python -c "import numpy as np; X = np.load('data/processed/X.npy'); print(f'Data shape: {X.shape}, No NaN: {not np.isnan(X).any()}')"

🤝 Contributing

We welcome contributions! Here's how to get started:

Development Setup

# Fork and clone the repository
git clone https://github.com/your-username/DeepFake-Detection.git
cd DeepFake-Detection

# Create development branch
git checkout -b feature/your-feature-name

# Install development dependencies
pip install -r requirements.txt
pip install pytest black flake8  # for testing and code formatting

Code Style

# Format code
black src/ --line-length 88

# Check style
flake8 src/ --max-line-length 88

# Run tests
pytest tests/

Contribution Areas

🚀 New Models: Implement additional ML algorithms
📊 Visualization: Create new plot types and analysis tools
🎥 Video Processing: Support more formats and preprocessing options
📱 Mobile Support: Optimize for mobile deployment
🔧 Performance: Speed and memory optimizations
📚 Documentation: Improve guides and examples

Pull Request Process

Create feature branch
Add tests for new functionality
Update documentation
Ensure all tests pass
Submit pull request with clear description

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🎯 Quick Commands Reference

# Complete pipeline (first time)
python src/preprocessing/extract_gait.py
python src/preprocessing/preprocess_gait.py
python src/models/train_baseline.py
python detect_deepfake.py your_video.mp4

# Hyperparameter tuning
python src/models/hyperparameter_tuning.py

# Deep learning training
python src/models/deep_learning_models.py

# Comprehensive evaluation
python evaluation_and_visualization.py

# Batch detection
python detect_deepfake.py data/ --batch --output results.json

# Visualization
python visualize_menu.py

📞 Support

🐛 Bug Reports: GitHub Issues
💡 Feature Requests: GitHub Discussions
📧 Email: [email protected]
📖 Documentation: Wiki

Made with ❤️ for advancing deepfake detection research

**⭐ Star this repo if you find it useful!**DeepFake-Detection Using gait analysis to study a person’s walking patterns in order to determine if a video or media depicts the real individual or a manipulated deepfake.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
data		data
models		models
results		results
src		src
.gitignore		.gitignore
README.md		README.md
create_skeleton_visualization.py		create_skeleton_visualization.py
create_test_videos.py		create_test_videos.py
extract_and_train.py		extract_and_train.py
gait_verification_system.py		gait_verification_system.py
regenerate_all_visualizations.py		regenerate_all_visualizations.py
regenerate_keypoints_csv.py		regenerate_keypoints_csv.py
requirements.txt		requirements.txt
retrain_with_new_videos.py		retrain_with_new_videos.py
show_outputs.py		show_outputs.py
train_new_dataset.py		train_new_dataset.py

a2rizing/DeepFake-Detection

Folders and files

Latest commit

History

Repository files navigation