Thanks to visit codestin.com
Credit goes to github.com

Skip to content

An implementation in Rust of Voxtral speech recognition using candle

License

Notifications You must be signed in to change notification settings

paazmaya/kitsune-stt

Repository files navigation

kitsune-stt

CI

An Speech-to-Text implementation in Rust of Voxtral speech recognition using candle.

Fox speaking to microphone and writing papers

The model used in the conversion is https://huggingface.co/mistralai/Voxtral-Mini-3B-2507

Features

  • 🎤 Speech-to-Text: Convert audio to text using Voxtral-Mini-3B model
  • 🚀 GPU Acceleration: CUDA and CUDNN support for faster inference
  • 📦 Audio Format and Codec Support: WAV, MP3, FLAC, OGG, M4A, and more, see https://docs.rs/symphonia/latest/symphonia/index.html
  • Performance: F16 memory optimization, chunked processing

Quick Start

Installation

# Install Rust (if not already installed)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Clone the repository
git clone https://github.com/paazmaya/kitsune-stt.git
cd kitsune-stt

There are two features, cuda and cudnn. They require additional libraries each:

Running

GPU (Recommended):

cargo run --all-features --release -- audio.wav

CPU Only:

cargo run --release -- --cpu audio.wav

Transcribe Audio

The following example commands would create a audio.txt in the same folder as the source audio file.

# Transcribe an audio file
cargo run --release -- --input audio.wav

# Force CPU mode
cargo run --release --features cuda -- --cpu --input audio.wav

Testing

Run the complete test suite:

# All tests
cargo test

# With all features
cargo test --all-features

Code Quality

# Format code
cargo fmt --all

# Run clippy lints
cargo clippy --all-targets --all-features -- -D warnings

# Security audit
cargo install cargo-audit
cargo audit

Development

See CONTRIBUTING.md for detailed development guidelines.

Requirements

  • Rust 1.70+
  • pkg-config (Linux)
  • CUDA toolkit (optional, for GPU)
  • ~7GB disk space (for model files)

License

MIT License - see LICENSE file for details.

About

An implementation in Rust of Voxtral speech recognition using candle

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •  

Languages