Thanks to visit codestin.com
Credit goes to github.com

Skip to content

AashiDutt/Voxtral_with_vLLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🎡 Voxtral Audio Assistant

A comprehensive Streamlit application that leverages Voxtral Mini 3B for audio processing, transcription, summarization, and multilingual Q&A capabilities.

✨ Features

  • 🎯 Audio Transcription: Automatically transcribe uploaded audio files
  • πŸ“‹ Content Summarization: Generate comprehensive summaries of audio content
  • πŸ’¬ Multilingual Q&A: Ask questions about audio content in multiple languages
  • 🎨 Modern UI: Beautiful, responsive interface with real-time progress indicators
  • πŸ“± Session Management: Persistent chat history and file management
  • βš™οΈ Configurable: Adjustable model parameters (temperature, top_p)

πŸš€ Quick Start

Prerequisites

  • Python 3.8 or higher
  • Voxtral API endpoint (you can run Voxtral locally or use a hosted service)
  • For Google Colab setup: ngrok account (free at https://ngrok.com)

Installation

  1. Clone or download this repository

    git clone <repository-url>
    cd Voxtral-Demo
  2. Install dependencies

    pip install -r requirements.txt
  3. Configure Voxtral API

    Edit app.py and update the API endpoint:

    # In the initialize_client() function
    openai_api_base = "https://your-voxtral-endpoint.com/v1"
  4. Run the application

    streamlit run app.py
  5. Open your browser Navigate to http://localhost:8501

πŸš€ Google Colab Setup (Recommended for T4 GPU)

For users with limited local resources, you can run Voxtral in Google Colab with T4 GPU:

  1. Run the Colab setup script:

    python colab_setup.py
  2. Upload colab_vllm_setup.ipynb to Google Colab

  3. Get your ngrok auth token from https://dashboard.ngrok.com/get-started/your-authtoken

  4. Run the notebook in Colab - this will start vLLM and create an ngrok tunnel

  5. Copy the ngrok URL and update your local configuration:

    python update_ngrok_url.py https://your-ngrok-url.ngrok-free.app
  6. Run the Streamlit app locally:

    streamlit run app.py

Note: Keep the Colab notebook running to maintain the ngrok tunnel.

πŸ“– Usage Guide

1. Audio Upload

  • Click "Browse files" to upload an audio file
  • Supported formats: MP3, WAV, M4A, FLAC, OGG
  • The file will be temporarily stored for processing

2. Audio Processing

  • Generate Summary: Click to create a comprehensive summary of the audio content
  • Transcribe Audio: Click to get the full text transcription with real-time progress

3. Multilingual Q&A

  • Select your preferred language from the sidebar
  • Type your question in the text input
  • Click "Ask Question" to get an AI-powered response
  • View conversation history in expandable sections

4. Configuration

  • Language Selection: Choose from 12 supported languages
  • Model Parameters: Adjust temperature and top_p for response generation
  • Session Management: Clear session data when needed

🌍 Supported Languages

  • English
  • Spanish
  • French
  • German
  • Italian
  • Portuguese
  • Russian
  • Chinese
  • Japanese
  • Korean
  • Arabic
  • Hindi

πŸ”§ Configuration

Voxtral API Setup

  1. Local Setup (Recommended for development):

    # Install vLLM
    pip install vllm
    
    # Run Voxtral locally
    vllm serve mistralai/Voxtral-Mini-3B-2507 --host 0.0.0.0 --port 8000
  2. Update API endpoint in app.py:

    openai_api_base = "http://localhost:8000/v1"

Environment Variables

You can also use environment variables for configuration:

export VOXTRAL_API_BASE="http://localhost:8000/v1"
export VOXTRAL_API_KEY="your-api-key"

πŸ“ Project Structure

Voxtral-Demo/
β”œβ”€β”€ app.py                 # Main Streamlit application
β”œβ”€β”€ requirements.txt       # Python dependencies
β”œβ”€β”€ README.md             # This file
β”œβ”€β”€ audio_qa.ipynb        # Original Q&A notebook
β”œβ”€β”€ transcription.ipynb   # Original transcription notebook
β”œβ”€β”€ translation.ipynb     # Original translation notebook
β”œβ”€β”€ sample_audio.mp3      # Sample audio file
└── output.wav           # Output file

πŸ› οΈ Technical Details

Architecture

  • Frontend: Streamlit with custom CSS styling
  • Backend: OpenAI-compatible API client for Voxtral
  • Audio Processing: Mistral Common library for audio chunking
  • Session Management: Streamlit session state for persistence

Key Functions

  • initialize_client(): Sets up OpenAI client for Voxtral API
  • file_to_chunk(): Converts audio files to AudioChunk format
  • transcribe_audio(): Handles audio transcription with streaming
  • generate_summary(): Creates content summaries
  • ask_question(): Processes multilingual Q&A requests

πŸ› Troubleshooting

Common Issues

  1. API Connection Error

    • Verify your Voxtral endpoint is running
    • Check network connectivity
    • Ensure correct API base URL
  2. Audio Processing Errors

    • Verify audio file format is supported
    • Check file size (recommended < 100MB)
    • Ensure audio file is not corrupted
  3. Memory Issues

    • Reduce audio file size
    • Clear session data periodically
    • Restart the application if needed

Debug Mode

Enable debug logging by adding this to your environment:

export STREAMLIT_LOG_LEVEL=debug

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Test thoroughly
  5. Submit a pull request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

πŸ“ž Support

For issues and questions:

  • Create an issue in this repository
  • Check the troubleshooting section above
  • Refer to Voxtral documentation at Hugging Face

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors