Models & Languages Overview
Models & Languages Overview
An overview of Deepgram’s speech-to-text models and supported languages.
Models & Languages Overview
An overview of Deepgram’s speech-to-text models and supported languages.
All models default to language=en unless otherwise specified via the language parameter.
To request any Deepgram Model, change MODEL_OPTION to the Model you want to use.
Replace YOUR_DEEPGRAM_API_KEY with your Deepgram API Key.
Flux is the first conversational speech recognition model built specifically for voice agents. Unlike traditional STT that passively transcribed what is said, Flux understands conversational flow and automatically handles turn-taking.
Flux tackles the most critical challenges for voice agents today: knowing when to listen, when to think, and when to speak. The model features first-of-its-kind model-integrated end-of-turn detection, configurable turn-taking dynamics, and ultra-low latency optimized for voice agent pipelines, all with Nova-3 level accuracy.
Nova-3 represents a significant leap forward in speech AI technology, featuring substantial improvements in accuracy and real-world application capabilities. The model delivers industry-leading performance with a 54.2% reduction in word error rate (WER) for streaming and 47.4% for batch processing compared to competitors.
Nova-3 introduces groundbreaking features including real-time multilingual conversation transcription, enhanced comprehension of domain-specific terminology, and optional personal information redaction. Notably, it’s the first voice AI model to offer self-serve customization, enabling instant vocabulary adaptation without model retraining. In multilingual testing, Nova-3 demonstrated superior performance across all seven tested languages, with particularly strong results showing up to 8:1 preference ratios in certain languages.
Recommended for use cases with languages not yet supported by nova-3, and filler word identification.
Nova 1 is the predecessor to Nova-2.
Recommended for lower word error rates than Base, high accuracy timestamps, and use cases that require keyword boosting.
Recommended for large transcription volumes and high accuracy timestamps.
Whisper models are less scalable than all other Deepgram models due to their inherent model architecture. All non-Whisper models will return results faster and scale to higher load.
Deepgram Whisper Cloud is a fully managed API that gives you access to Deepgram’s version of OpenAI’s Whisper model. Read our guide Deepgram Whisper Cloud for a deeper dive into this offering.
Deepgram’s Whisper Cloud models can be called with the following syntax: