Introduction
Argmax SDK is a collection of turn-key on-device inference frameworks:
-
WhisperKit Pro- File Transcription
- Real-time Transcription
- Language Detection
- Word Timestamps
- Custom Vocabulary
-
SpeakerKit Pro- Voice Activity Detection
- Speaker Diarization
- Diarized Transcription
Architecture
Argmax SDK follows an open-core architecture where the Pro SDK extends the Open-source SDK:
- Argmax Open-source SDK:
WhisperKit - Argmax Pro SDK:
WhisperKit Pro,SpeakerKit Pro
This architecture was explicitly designed to facilitate seamless upgrades and downgrades between the free tier (Open-source SDK) and the paid tier (Pro SDK).
Please see Open-source vs Pro SDK for a detailed feature set comparison.
Integration
Native Apps
Argmax Pro SDK may be integrated as a Swift Package via SwiftPM for native apps
Please see Installation for more details.
Other Apps
Argmax Local Server is built using Argmax Pro SDK and currently offers Real-time Transcription.
Key features include:
- Node and Python client packages
- API compatible with Deepgram
- macOS only
Please see Using Local Server for more details.
Use Cases
Ambient AI for Healthcare
- Real-time streaming transcription of doctor-patient conversations
- Medically-tuned custom model support
- Speaker diarization to attribute statements to doctor and patient
- Example product built with Argmax SDK: ModMed Scribe
AI Meeting Notes
- Real-time streaming transcription of work meetings
- Custom vocabulary for accurate person and company names
- Speaker diarization to attribute statements to each meeting attendees
- Example product built with Argmax SDK: Macwhisper
Personal Dictation
- Ultra low-latency dictation
- Custom vocabulary for accurate person and company names
- Example product built with Argmax SDK: superwhisper
Video content creation
- Offline captioning (Word timestamps, SRT and VTT output formats)
- Live captioning (Real-time transcription)
- Silence removal (Voice Activity Detection)
- Text-based video editing (Word timestamps)
- Example product built with Argmax SDK: Detail
Why on-device?
Accuracy
On-device inference does not imply usage of smaller & less accurate models. Argmax builds systems that match or exceed cloud-based API-level accuracy:
WhisperKit Prosupports the largest and most accurate open-source speech-to-text models (Whisper Large V3) on ALL iOS and macOS devices released since 2020 (iPhone 12 or newer, M1 Mac or newer).SpeakerKit Prosupports the state-of-the-art Pyannote-v4 system on an even wider range of devices.
For the ever-shrinking fraction of users with even older devices, Argmax offers hybrid deployment to fall back to the server-side and retain a user experience with uniform accuracy.
Upholding accuracy is our top priority (even more so than speed). We continuously benchmark our products on industry-standard test sets:
- Accuracy and speed benchmarks are continuously published on Argmax OpenBench.
- Competitive benchmarks are published in our WhisperKit (ICML) and SpeakerKit (Interspeech) papers.
Low Latency
Applications built with real-time inference enjoy lower latency when deployed on device instead of the cloud because on-device is:
- Optimized for minimum latency for a single user instead of maximum throughput (at the cost of higher latency) for many concurrent users
- Decoupled from global inference traffic jams which occasionaly lead cloud services to be unavailable or unexpectedly slow
- Not subject to internet roundtrip latency
Everything Else
| Concern | On-device (with Argmax) | Cloud-based |
|---|---|---|
| Availability | 100% by definition | < 100% Uptime |
| Scalability (Usage) | Unlimited | Rate-limited & concurrency-limited |
| Scalability (Cost) | Fixed | Unlimited (Usage-based) |
| Transparency | Open-core, transparent versioning | Proprietary, silent versioning |
| Data Privacy | Procesed locally | Upload required |