MedVoice Service

Production-ready REST API for audio processing using WhisperX with Temporal workflow orchestration. Features transcription, alignment, diarization, and medical RAG integration with local LLMs via LM Studio.

Features

Audio Transcription - State-of-the-art speech-to-text with WhisperX
Speaker Diarization - Multi-speaker identification and segmentation
Temporal Workflows - Asynchronous job processing with retry logic
Medical Processing - PHI detection, SOAP notes, entity extraction
Web Interface - Streamlit UI for live recording and transcription
Local LLM Integration - LM Studio support for medical AI

Requirements

Python: 3.11+
HF_TOKEN: Required for model downloads (get from HuggingFace)

Software Dependencies:

Docker Desktop - Container runtime (download)
LM Studio - Local LLM server for medical AI features (download)

Prerequisites (macOS)

System Dependencies: Homebrew - Package manager for macOS (install)

brew install ffmpeg pkg-config make

Python Package Manager:

curl -LsSf https://astral.sh/uv/install.sh | sh

Quick Start

Docker (Recommended)

# Configure environment
cp .env.example .env
# Edit .env with your HF_TOKEN

# Install dependencies
make install

# Build and start all services
make build

# Access services
# API: http://localhost:8000/docs
# Temporal UI: http://localhost:8233
# Web UI: http://localhost:8501

Local Development

Temporal CLI: Required for local development (install from GitHub releases)

# Configure environment
cp .env.example .env

# Install dependencies
make install

# Start full application (FastAPI + Temporal + Streamlit)
make dev

Services

Service	URL	Description
FastAPI	http://localhost:8000	REST API with Scalar/Swagger docs
Web UI	http://localhost:8501	Web interface for audio processing
Temporal UI	http://localhost:8233	Workflow monitoring dashboard

Architecture

Client → FastAPI → Temporal → Activities (Transcribe → Align → Diarize)
                    ↓
                 Patient DB (SQLite)
                    ↓
              Medical LLM (LM Studio)

API Endpoints

Speech-to-Text

POST /speech-to-text - Full processing pipeline
POST /speech-to-text-url - Process from URL
GET /tasks/{task_id} - Check workflow status

Medical (requires LM Studio)

POST /medical/process - Full medical pipeline
POST /medical/soap - Generate SOAP note
POST /medical/entities - Extract medical entities
POST /medical/chat - RAG-powered chatbot

Admin

GET /admin - Database interface (SqlAdmin)
GET /admin/patients - List all patients
GET /admin/database/stats - Database statistics

Supported Formats

Audio: .oga, .m4a, .aac, .wav, .amr, .wma, .awb, .mp3, .ogg

Video: .wmv, .mkv, .avi, .mov, .mp4

Available Models

Standard Models: tiny, base, small, medium, large-v3-turbo

Distilled: distil-large-v3, distil-medium.en, distil-small.en

Specialized: nyrahealth/faster_CrisperWhisper (medical)

Development

Commands

# Start services with Docker
make build            # Build all services
make up               # Start all services
make down             # Stop all services

# Start services without Docker
make dev              # Full application (API + Temporal + Streamlit)
make server           # FastAPI only
make worker           # Temporal + worker
make web              # Web UI only

# Stop services
make stop             # Stop all processes

# Temporal management
make temporal-fresh   # Clean restart Temporal
make check-activities # Monitor running workflows

# Testing
make test             # All tests
make unit-test        # Unit tests with coverage
make integration-test # Integration tests

# Code quality
make lint             # Run linters
make format           # Format code

Medical RAG with LM Studio

Setup

# Install LM Studio (https://lmstudio.ai/)

# Download models
# - MedAlpaca-7B or Meditron-7B (generation)
# - nomic-embed-text-v1.5 (embeddings)

# Configure .env
cp .env.example .env

# Start LM Studio server
# Local Server tab → Select model → Start Server

Features

PHI detection & anonymization
Medical entity extraction (diagnoses, medications, procedures)
SOAP note generation (Subjective, Objective, Assessment, Plan)
Semantic search with vector embeddings (FAISS)

Performance

GPU (RTX 4090/A10): ~15-25s per consultation CPU: ~50-90s per consultation

Documentation

Troubleshooting

Model download fails

# Verify HF_TOKEN
curl -H "Authorization: Bearer YOUR_TOKEN" https://huggingface.co/api/whoami

Temporal workflows stuck

make temporal-fresh  # Clean restart

LM Studio not responding

curl http://localhost:1234/v1/models

Related Projects

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 504 Commits
.agent/workflows		.agent/workflows
.devcontainer		.devcontainer
.github		.github
.vscode		.vscode
app		app
datasets		datasets
docs		docs
scripts		scripts
streamlit_app		streamlit_app
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.gitleaks.toml		.gitleaks.toml
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
Dockerfile.streamlit		Dockerfile.streamlit
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
config.yaml		config.yaml
context7.json		context7.json
docker-compose.gpu.yaml		docker-compose.gpu.yaml
docker-compose.yaml		docker-compose.yaml
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg
start_server.py		start_server.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MedVoice Service

Features

Requirements

Prerequisites (macOS)

Quick Start

Docker (Recommended)

Local Development

Services

Architecture

API Endpoints

Speech-to-Text

Medical (requires LM Studio)

Admin

Supported Formats

Available Models

Development

Commands

Medical RAG with LM Studio

Setup

Features

Performance

Documentation

Troubleshooting

Related Projects

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MedVoice Service

Features

Requirements

Prerequisites (macOS)

Quick Start

Docker (Recommended)

Local Development

Services

Architecture

API Endpoints

Speech-to-Text

Medical (requires LM Studio)

Admin

Supported Formats

Available Models

Development

Commands

Medical RAG with LM Studio

Setup

Features

Performance

Documentation

Troubleshooting

Related Projects

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages