Agentic Pipeline for Speaker Diarization and Quality Check

This notebook implements a Python pipeline for generating subtitles from audio files, featuring speaker diarization and a quality check agent. It's useful for analyzing short clips with multiple speakers, producing SRT files with labeled dialogues.

Features

Speaker diarization with consistent labeling.
Dialogue-level transcription.
Confidence-based quality evaluation per segment.

Approach

Audio Preparation: Standardize audio to mono PCM WAV.
Transcription: Leverage Whisper for accurate segmenting.
Diarization: Use pyannote embeddings and clustering for speaker assignment—more reliable than basic methods.
Merging: Combine segments for natural flow.
Output: SRT with labels.
Quality Agent: Rule-based confidence scoring and feedback.

Pipeline Flow

Input audio → Convert → Transcribe → Embed & Cluster → Merge → SRT & Quality Report.

Limitations and Improvements

Limitations: Fixed speaker count; best on clean audio.
Improvements: Add overlap detection; LLM feedback; auto-speaker estimation.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
agentic_pipeline_shivam.ipynb		agentic_pipeline_shivam.ipynb
audio.wav		audio.wav
output.srt		output.srt
output2.srt		output2.srt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Agentic Pipeline for Speaker Diarization and Quality Check

Features

Approach

Pipeline Flow

Limitations and Improvements

About

Uh oh!

Releases

Packages

Languages

shivxmr/speech-diarization

Folders and files

Latest commit

History

Repository files navigation

Agentic Pipeline for Speaker Diarization and Quality Check

Features

Approach

Pipeline Flow

Limitations and Improvements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages