An AI-powered web application designed specifically for visually impaired individuals to navigate train environments safely using voice commands and camera vision.
- β API Key Security: Moved OpenRouter API key from frontend to backend
- β
Backend Proxy: Created
/vision-queryendpoint to handle API calls securely - β
Environment Variables: All sensitive keys now stored in
.envfile
-
β Sound Notifications:
- Two-beep success sound (800Hz + 1000Hz) when image processing completes
- Error sound (400Hz) for failed operations
- All key actions have audio feedback
-
β Screen Reader Support:
- ARIA labels on all interactive elements
- Live regions for dynamic content updates
- Proper heading hierarchy
- Role attributes for semantic structure
-
β Keyboard Navigation:
- Full keyboard support (Enter/Space on all buttons)
- Visible focus indicators
- Skip to main content link
-
β Voice Feedback:
- Slower speech rate (0.9x) for clarity
- Auto-read responses aloud
- Status announcements
Create a .env file in the root directory:
# OpenRouter API Key for vision model (REQUIRED)
OPENROUTER_API_KEY=your_openrouter_api_key_here
# Roboflow API Key (REQUIRED)
ROBOFLOW_API_KEY=your_roboflow_api_key_hereNote: Speech recognition uses Google's FREE service (no API key needed)!
# Install Python dependencies
pip install flask python-dotenv pydub SpeechRecognition opencv-python numpy inference-sdk requests
# Install system dependencies for audio processing
# On Ubuntu/Debian:
sudo apt-get install ffmpeg
# On macOS:
brew install ffmpegyour-project/
βββ app.py # Flask backend with secure API endpoints
βββ .env # Environment variables (DO NOT COMMIT)
βββ .env.example # Example environment file
βββ static/
β βββ css/
β β βββ style.css # Accessibility-enhanced styles
β βββ js/
β β βββ app.js # Frontend with sound notifications
β βββ pictures/
β βββ fewshot1.jpg # Training example images
β βββ fewshot2.jpg
β βββ fewshot3.jpg
βββ templates/
βββ index.html # Accessible HTML template
Development:
python app.pyProduction:
gunicorn app:app --bind 0.0.0.0:5000 --workers 2 --timeout 120The application will be available at http://localhost:5000
This project is configured for one-click deployment to Render:
- Quick Deploy: See RENDER_DEPLOYMENT.md for detailed instructions
- Free Tier: Deploy on Render's free tier with auto-scaling
- System Dependencies: Automatically installs ffmpeg and OpenCV dependencies
Key Files for Deployment:
render.yaml- Render Blueprint configurationProcfile- Process file for gunicornAptfile- System dependencies (ffmpeg, opencv libs)runtime.txt- Python version specification
- Railway: Similar to Render, use
Procfile - Fly.io: Create
fly.tomlconfiguration - DigitalOcean: Use App Platform with buildpack
- VPS: Use nginx + gunicorn + systemd
See RENDER_DEPLOYMENT.md for complete deployment guide.
// β API key exposed in frontend
const apiKey = 'sk-or-v1-42...';
const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
headers: { 'Authorization': `Bearer ${apiKey}` }
});// β
API call goes through backend
const response = await fetch('/vision-query', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ messages, model, temperature })
});Backend handles the API key securely:
# app.py
OPENROUTER_API_KEY = os.getenv("OPENROUTER_API_KEY")
@app.route('/vision-query', methods=['POST'])
def vision_query():
response = requests.post(
'https://openrouter.ai/api/v1/chat/completions',
headers={'Authorization': f'Bearer {OPENROUTER_API_KEY}'}
)- Success Sound: Plays when image is captured and processed
- Error Sound: Plays when operations fail
- Voice Announcements: Screen reader announces all state changes
- Response Reading: AI responses are automatically read aloud
- High Contrast Mode: Supports system high contrast settings
- Reduced Motion: Respects prefers-reduced-motion preference
- Focus Indicators: Clear 3px outline on focused elements
- Large Text Support: Scales properly with browser zoom
- Tab Navigation: All interactive elements are keyboard accessible
- Enter/Space: Activate buttons without mouse
- Skip Link: Jump directly to main content
- Focus Trap: Logical focus order throughout the app
- ARIA Labels: Descriptive labels on all controls
- Live Regions: Dynamic updates announced automatically
- Semantic HTML: Proper heading hierarchy and landmarks
- Alt Text: All images have appropriate alt text
// Success sound (image captured/processed)
playSuccessSound(); // Two beeps: 800Hz β 1000Hz
// Error sound (operation failed)
playErrorSound(); // Single beep: 400HzThese sounds help visually impaired users know when:
- Image has been captured successfully
- Voice transcription is complete
- Send button is now enabled
- API response is ready
- Any errors occur
Serves the main application interface
Transcribes voice recording to text
- Input: Audio file (OGG format)
- Output:
{ "text": "transcribed text" }
Securely handles OpenRouter vision API calls
- Input:
{ "messages": [...], "model": "qwen/qwen-2-vl-72b-instruct", "temperature": 0.3, "max_tokens": 150 } - Output: OpenRouter API response
Processes image with Roboflow
- Input: Form data with
imageanduser_query - Output:
{ "response": "analysis result" }
- β Chrome/Edge 90+
- β Firefox 88+
- β Safari 14+
- β Mobile browsers (iOS Safari, Chrome Mobile)
- Never commit
.envfile - Add to.gitignore - Use environment variables for all API keys
- Implement rate limiting on API endpoints (recommended)
- Use HTTPS in production
- Validate all inputs on backend
- Set CORS headers appropriately
// User must interact with page first (browser security)
// The first button click will initialize AudioContext# Make sure .env file exists and has correct keys
# Restart Flask server after changing .env// Some browsers require HTTPS for speech synthesis
// Test in development with http://localhost:5000- Screen Reader: Test with NVDA (Windows) or VoiceOver (Mac)
- Keyboard Only: Navigate without mouse
- Color Contrast: Use browser DevTools accessibility audit
- Zoom: Test at 200% zoom level
- Always add sound feedback for state changes
- Include ARIA labels for new UI elements
- Test with keyboard navigation
- Announce changes to screen readers
Β© 2025 Voice & Vision Assistant | All Rights Reserved
When contributing, please ensure:
- All new features have accessibility support
- API keys remain secure in backend
- Sound feedback is added for user actions
- ARIA labels are included
- Keyboard navigation works properly
For issues or questions:
- Check the Common Issues section
- Review browser console for errors
- Ensure all dependencies are installed
- Verify environment variables are set correctly