Video Tutorial Processor

A Flask-based web application that automatically converts screen-capture tutorial videos into structured markdown documentation using AI-powered transcription, OCR, and content analysis.

Features

Automated Video Processing: Upload tutorial videos and receive markdown documentation via email
AI-Powered Analysis: Uses OpenAI GPT-4o and Whisper for transcription and content extraction
Screen-Optimized Frame Extraction: Multiple detection methods optimized for screen recordings
Intelligent OCR: Extracts text from video frames using vision language models
Async Processing: Celery-based task queue for handling long-running video processing jobs
Email Delivery: Automatic email notifications with generated documentation attachments

How It Works

Upload: Users upload video files through a web interface
Audio Transcription: Extract and transcribe audio using Whisper API
Frame Analysis: Extract key frames using scene detection, motion analysis, and periodic sampling
OCR Processing: Extract text content from frames using OpenAI vision models
Segment Alignment: Match transcript segments with relevant visual frames
Tutorial Generation: Create structured step-by-step instructions using GPT-4o
Delivery: Email the final markdown documentation to the user

Requirements

Python 3.8+
Redis server
FFmpeg
OpenAI API key
SMTP server access

Installation

Clone the repository:

git clone https://github.com/ngaited/PrattVideoTutorialProcessor.git
cd PrattVideoTutorialProcessor

Install dependencies:
```
pip install -r requirements.txt
```

Set up environment variables: Create a .env file with the following variables:

OPENAI_API_KEY=your_openai_api_key
WHISPER_BASE_URL=http://your-whisper-service:8009/
CELERY_BROKER_URL=redis://localhost:6379/0
CELERY_RESULT_BACKEND=redis://localhost:6379/0
FLASK_SECRET_KEY=your_secret_key
[email protected]
SMTP_HOST=smtp.your-provider.com
SMTP_PORT=587
SMTP_USER=your_smtp_username
SMTP_PASSWORD=your_smtp_password

Start Redis server:
```
redis-server
```

Usage

Start the Celery worker (in one terminal):

celery -A celery_worker worker --loglevel=info

Start the Flask application (in another terminal):
```
python app.py
```
Access the web interface: Open your browser and go to http://localhost:5000
Upload a video:
- Select a screen-capture tutorial video
- Enter your email address
- Click "Submit Job"
Wait for processing:
- Processing time varies with video length (3 min video ≈ 5 min processing)
- You'll receive an email with the generated markdown documentation

Best Practices for Video Input

Use clear, detailed narration that describes each step
Screen recordings work better than camera footage
Ensure good audio quality for accurate transcription
Keep videos focused on specific tutorial topics
Avoid background noise and interruptions

Project Structure

PrattVideoTutorialProcessor/
├── app.py                 # Flask web application
├── celery_worker.py       # Celery task processing
├── processor.py           # Main video processing pipeline
├── requirements.txt       # Python dependencies
├── templates/
│   └── index.html        # Upload interface
├── static/
│   └── asset/
│       └── Pratt_Institute_Logo.svg
├── uploads/              # Temporary video storage
└── job_outputs/          # Processing results (auto-cleaned)

API Dependencies

This project integrates with several external services:

OpenAI API: For GPT-4o text generation and vision processing
Whisper API: For audio transcription (requires separate service)
Redis: For task queue management
SMTP Server: For email delivery

Processing Pipeline

The application follows a sophisticated multi-stage pipeline:

Audio Extraction: Uses FFmpeg to extract audio optimized for speech recognition
Frame Extraction: Combines multiple detection methods:
- Scene change detection
- Motion-based detection
- Periodic sampling
- Histogram analysis
Frame Deduplication: Removes similar frames using perceptual hashing
OCR Processing: Extracts text from frames using OpenAI vision models
Segment Alignment: Matches transcript segments with relevant frames
Tutorial Generation: Creates structured instructions using GPT-4o
Finalization: Merges all steps into coherent markdown documentation

Configuration

Transcription Parameters

The system uses optimized parameters for video tutorial transcription:

Temperature: 0.0 (for clarity)
Language: English
Returns timestamps for segment alignment
Optimized for technical content

Frame Extraction Settings

Adaptive sampling based on video duration
Multiple detection thresholds for screen content
Minimum/maximum frame limits to ensure quality
Optimized for screen recordings vs. general video

Error Handling

Comprehensive error handling throughout the pipeline
Email notifications for both success and failure cases
Automatic cleanup of temporary files
Detailed logging for debugging

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

For issues and questions:

Create an issue on GitHub
Check the logs for detailed error information
Verify all API keys and environment variables are correctly set

Acknowledgments

Built for Pratt Institute
Uses OpenAI's GPT-4o, o4-mini, and Whisper models
Leverages FFmpeg for video processing
Powered by Flask and Celery for web interface and task processing

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
static/asset		static/asset
templates		templates
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
app.py		app.py
celery_worker.py		celery_worker.py
processor.py		processor.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Video Tutorial Processor

Features

How It Works

Requirements

Installation

Usage

Best Practices for Video Input

Project Structure

API Dependencies

Processing Pipeline

Configuration

Transcription Parameters

Frame Extraction Settings

Error Handling

Contributing

License

Support

Acknowledgments

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

ngaited/PrattVideoTutorialProcessor

Folders and files

Latest commit

History

Repository files navigation

Video Tutorial Processor

Features

How It Works

Requirements

Installation

Usage

Best Practices for Video Input

Project Structure

API Dependencies

Processing Pipeline

Configuration

Transcription Parameters

Frame Extraction Settings

Error Handling

Contributing

License

Support

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages