kitsune-stt

An Speech-to-Text implementation in Rust of Voxtral speech recognition using candle.

The model used in the conversion is https://huggingface.co/mistralai/Voxtral-Mini-3B-2507

Features

🎤 Speech-to-Text: Convert audio to text using Voxtral-Mini-3B model
🚀 GPU Acceleration: CUDA and CUDNN support for faster inference
📦 Audio Format and Codec Support: WAV, MP3, FLAC, OGG, M4A, and more, see https://docs.rs/symphonia/latest/symphonia/index.html
⚡ Performance: F16 memory optimization, chunked processing

Quick Start

Installation

# Install Rust (if not already installed)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Clone the repository
git clone https://github.com/paazmaya/kitsune-stt.git
cd kitsune-stt

There are two features, cuda and cudnn. They require additional libraries each:

Running

GPU (Recommended):

cargo run --all-features --release -- audio.wav

CPU Only:

cargo run --release -- --cpu audio.wav

Transcribe Audio

The following example commands would create a audio.txt in the same folder as the source audio file.

# Transcribe an audio file
cargo run --release -- --input audio.wav

# Force CPU mode
cargo run --release --features cuda -- --cpu --input audio.wav

Testing

Run the complete test suite:

# All tests
cargo test

# With all features
cargo test --all-features

Code Quality

# Format code
cargo fmt --all

# Run clippy lints
cargo clippy --all-targets --all-features -- -D warnings

# Security audit
cargo install cargo-audit
cargo audit

Development

See CONTRIBUTING.md for detailed development guidelines.

Requirements

Rust 1.70+
pkg-config (Linux)
CUDA toolkit (optional, for GPU)
~7GB disk space (for model files)

License

MIT License - see LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.cargo		.cargo
.github/workflows		.github/workflows
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
logo.png		logo.png
melfilters128.bytes		melfilters128.bytes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

kitsune-stt

Features

Quick Start

Installation

Running

Transcribe Audio

Testing

Code Quality

Development

Requirements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

paazmaya/kitsune-stt

Folders and files

Latest commit

History

Repository files navigation

kitsune-stt

Features

Quick Start

Installation

Running

Transcribe Audio

Testing

Code Quality

Development

Requirements

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages