Thanks to visit codestin.com
Credit goes to app.argmaxinc.com

Docs
Introduction

Introduction

Argmax SDK is a collection of turn-key on-device inference frameworks:

  • WhisperKit Pro

    • File Transcription
    • Real-time Transcription
    • Language Detection
    • Word Timestamps
    • Custom Vocabulary
  • SpeakerKit Pro

    • Voice Activity Detection
    • Speaker Diarization
    • Diarized Transcription

Architecture

Argmax SDK follows an open-core architecture where the Pro SDK extends the Open-source SDK:

  • Argmax Open-source SDK: WhisperKit
  • Argmax Pro SDK: WhisperKit Pro, SpeakerKit Pro

This architecture was explicitly designed to facilitate seamless upgrades and downgrades between the free tier (Open-source SDK) and the paid tier (Pro SDK).

Please see Open-source vs Pro SDK for a detailed feature set comparison.

Integration

Native Apps

Argmax Pro SDK may be integrated as a Swift Package via SwiftPM for native apps

Please see Installation for more details.

Other Apps

Argmax Local Server is built using Argmax Pro SDK and currently offers Real-time Transcription.

Key features include:

  • Node and Python client packages
  • API compatible with Deepgram
  • macOS only

Please see Using Local Server for more details.

Use Cases

Ambient AI for Healthcare

  • Real-time streaming transcription of doctor-patient conversations
  • Medically-tuned custom model support
  • Speaker diarization to attribute statements to doctor and patient
  • Example product built with Argmax SDK: ModMed Scribe

AI Meeting Notes

  • Real-time streaming transcription of work meetings
  • Custom vocabulary for accurate person and company names
  • Speaker diarization to attribute statements to each meeting attendees
  • Example product built with Argmax SDK: Macwhisper

Personal Dictation

  • Ultra low-latency dictation
  • Custom vocabulary for accurate person and company names
  • Example product built with Argmax SDK: superwhisper

Video content creation

  • Offline captioning (Word timestamps, SRT and VTT output formats)
  • Live captioning (Real-time transcription)
  • Silence removal (Voice Activity Detection)
  • Text-based video editing (Word timestamps)
  • Example product built with Argmax SDK: Detail

Why on-device?

Accuracy

On-device inference does not imply usage of smaller & less accurate models. Argmax builds systems that match or exceed cloud-based API-level accuracy:

  • WhisperKit Pro supports the largest and most accurate open-source speech-to-text models (Whisper Large V3) on ALL iOS and macOS devices released since 2020 (iPhone 12 or newer, M1 Mac or newer).
  • SpeakerKit Pro supports the state-of-the-art Pyannote-v4 system on an even wider range of devices.

For the ever-shrinking fraction of users with even older devices, Argmax offers hybrid deployment to fall back to the server-side and retain a user experience with uniform accuracy.

Upholding accuracy is our top priority (even more so than speed). We continuously benchmark our products on industry-standard test sets:

Low Latency

Applications built with real-time inference enjoy lower latency when deployed on device instead of the cloud because on-device is:

  • Optimized for minimum latency for a single user instead of maximum throughput (at the cost of higher latency) for many concurrent users
  • Decoupled from global inference traffic jams which occasionaly lead cloud services to be unavailable or unexpectedly slow
  • Not subject to internet roundtrip latency

Everything Else

ConcernOn-device (with Argmax)Cloud-based
Availability100% by definition< 100% Uptime
Scalability (Usage)UnlimitedRate-limited & concurrency-limited
Scalability (Cost)FixedUnlimited (Usage-based)
TransparencyOpen-core, transparent versioningProprietary, silent versioning
Data PrivacyProcesed locallyUpload required