A comprehensive Streamlit application that leverages Voxtral Mini 3B for audio processing, transcription, summarization, and multilingual Q&A capabilities.
- π― Audio Transcription: Automatically transcribe uploaded audio files
- π Content Summarization: Generate comprehensive summaries of audio content
- π¬ Multilingual Q&A: Ask questions about audio content in multiple languages
- π¨ Modern UI: Beautiful, responsive interface with real-time progress indicators
- π± Session Management: Persistent chat history and file management
- βοΈ Configurable: Adjustable model parameters (temperature, top_p)
- Python 3.8 or higher
- Voxtral API endpoint (you can run Voxtral locally or use a hosted service)
- For Google Colab setup: ngrok account (free at https://ngrok.com)
-
Clone or download this repository
git clone <repository-url> cd Voxtral-Demo
-
Install dependencies
pip install -r requirements.txt
-
Configure Voxtral API
Edit
app.pyand update the API endpoint:# In the initialize_client() function openai_api_base = "https://your-voxtral-endpoint.com/v1"
-
Run the application
streamlit run app.py
-
Open your browser Navigate to
http://localhost:8501
For users with limited local resources, you can run Voxtral in Google Colab with T4 GPU:
-
Run the Colab setup script:
python colab_setup.py
-
Upload
colab_vllm_setup.ipynbto Google Colab -
Get your ngrok auth token from https://dashboard.ngrok.com/get-started/your-authtoken
-
Run the notebook in Colab - this will start vLLM and create an ngrok tunnel
-
Copy the ngrok URL and update your local configuration:
python update_ngrok_url.py https://your-ngrok-url.ngrok-free.app
-
Run the Streamlit app locally:
streamlit run app.py
Note: Keep the Colab notebook running to maintain the ngrok tunnel.
- Click "Browse files" to upload an audio file
- Supported formats: MP3, WAV, M4A, FLAC, OGG
- The file will be temporarily stored for processing
- Generate Summary: Click to create a comprehensive summary of the audio content
- Transcribe Audio: Click to get the full text transcription with real-time progress
- Select your preferred language from the sidebar
- Type your question in the text input
- Click "Ask Question" to get an AI-powered response
- View conversation history in expandable sections
- Language Selection: Choose from 12 supported languages
- Model Parameters: Adjust temperature and top_p for response generation
- Session Management: Clear session data when needed
- English
- Spanish
- French
- German
- Italian
- Portuguese
- Russian
- Chinese
- Japanese
- Korean
- Arabic
- Hindi
-
Local Setup (Recommended for development):
# Install vLLM pip install vllm # Run Voxtral locally vllm serve mistralai/Voxtral-Mini-3B-2507 --host 0.0.0.0 --port 8000
-
Update API endpoint in app.py:
openai_api_base = "http://localhost:8000/v1"
You can also use environment variables for configuration:
export VOXTRAL_API_BASE="http://localhost:8000/v1"
export VOXTRAL_API_KEY="your-api-key"Voxtral-Demo/
βββ app.py # Main Streamlit application
βββ requirements.txt # Python dependencies
βββ README.md # This file
βββ audio_qa.ipynb # Original Q&A notebook
βββ transcription.ipynb # Original transcription notebook
βββ translation.ipynb # Original translation notebook
βββ sample_audio.mp3 # Sample audio file
βββ output.wav # Output file
- Frontend: Streamlit with custom CSS styling
- Backend: OpenAI-compatible API client for Voxtral
- Audio Processing: Mistral Common library for audio chunking
- Session Management: Streamlit session state for persistence
initialize_client(): Sets up OpenAI client for Voxtral APIfile_to_chunk(): Converts audio files to AudioChunk formattranscribe_audio(): Handles audio transcription with streaminggenerate_summary(): Creates content summariesask_question(): Processes multilingual Q&A requests
-
API Connection Error
- Verify your Voxtral endpoint is running
- Check network connectivity
- Ensure correct API base URL
-
Audio Processing Errors
- Verify audio file format is supported
- Check file size (recommended < 100MB)
- Ensure audio file is not corrupted
-
Memory Issues
- Reduce audio file size
- Clear session data periodically
- Restart the application if needed
Enable debug logging by adding this to your environment:
export STREAMLIT_LOG_LEVEL=debug- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- Mistral AI for Voxtral model
- Streamlit for the web framework
- vLLM for model serving
For issues and questions:
- Create an issue in this repository
- Check the troubleshooting section above
- Refer to Voxtral documentation at Hugging Face