High-performance speech recognition service powered by FunASR with multi-worker parallel processing.
- Multi-language ASR (Chinese, English, Cantonese, Japanese, Korean)
- Speaker diarization and timestamp annotation
- Task queue with parallel worker processing
- Flexible GPU/CPU deployment
- RESTful API with FastAPI
docker compose up -d# Install dependencies
pip install -r requirements.txt
# Run server
python main.pyServer runs at http://localhost:8000
| Variable | Description | Default |
|---|---|---|
NUM_WORKERS |
Number of parallel workers | 1 |
DEVICE_TEMPLATE |
Device allocation pattern | cuda:0 |
MODEL_DIR |
ASR model directory | /model/SenseVoiceSmall |
MAX_QUEUE_SIZE |
Task queue capacity | 1000 |
TASK_TIMEOUT |
Task timeout (seconds) | 300 |
# Single GPU
DEVICE_TEMPLATE="cuda:0"
# Multi-GPU (auto-assign: Worker 0�GPU 0, Worker 1�GPU 1...)
DEVICE_TEMPLATE="cuda:{worker_id}"
# CPU only
DEVICE_TEMPLATE="cpu"POST /api/v1/transcribeExample:
curl -X POST "http://localhost:8000/api/v1/transcribe" \
-F "[email protected]" \
-F "language=auto"Response:
{
"text": "Complete transcription",
"sentence_info": [
{
"start_time": "00:00:00",
"end_time": "00:00:03",
"sentence": "First sentence",
"speaker": 0
}
]
}GET /api/v1/queue/statsReturns queue size, worker status, and task counts.
GET /api/v1/health# Concurrent load test
python test_api.py audio.wavRequest TaskQueue Worker 0 (GPU 0)
Worker 1 (GPU 0)
Worker N (GPU 0)
Each worker runs an independent model instance for parallel processing.