Bridging the Communication Gap with Artificial Intelligence
EduSign AI is a bi-directional AI communication platform designed to connect deaf and hearing individuals in real time. It translates speech and text into sign language and also interprets sign language gestures into speech and text, powered by deep learning, computer vision, and 3D avatar technology.
Whether in classrooms, workplaces, or online meetings, EduSign AI ensures that everyone β regardless of hearing ability β can understand, participate, and communicate effortlessly.
Deaf and hard-of-hearing individuals often face barriers in education, meetings, and daily life. Existing captioning tools only cover text, missing the full expression of tone, emotion, and gesture. There is a global need for an accessible solution that enables two-way communication between signers and speakers in any environment.
EduSign AI bridges this gap through an intelligent system that:
- Recognizes speech and text, converting them into expressive sign language via a 3D avatar.
- Understands sign gestures from users and converts them into spoken or written language.
- Displays real-time transcripts alongside the avatar to ensure clarity and learning support.
It empowers full participation in classrooms, online meetings (Zoom, Google Meet, Teams), and social environments β creating an inclusive communication experience for all.
- Captures live speech using Whisper AI or Google STT.
- Converts the text into sign language animations via a 3D avatar.
- Simultaneously displays transcripts as captions for learning support.
- Captures sign language gestures through MediaPipe or OpenPose.
- Interprets gestures into text using a fine-tuned WLASL-based model.
- Converts text to speech and shows a transcript confirmation.
- Users can toggle transcripts, adjust signing speed, or replay translations.
- Supports multiple spoken and sign languages for global inclusivity.
- ποΈ Speech-to-Sign Translation β Converts voice or text to sign language via an expressive avatar.
- β Sign-to-Speech Recognition β Converts gestures to spoken or written words using AI vision.
- π§ 3D Avatar Interpreter β Natural, emotion-aware signing with lifelike gestures.
- π¬ Real-Time Transcripts β Displayed alongside avatar for clarity and accessibility.
- π Multi-Platform Support β Integrates into Zoom, Google Meet, Teams, and web apps.
- πΆ Offline Mode β Works in low-connectivity areas using local AI models.
- π§© Multi-Language + Multi-Sign Support β GSL, ASL, BSL, and more.
- π Emotion-Aware Communication β Preserves tone and intent across translations.
- π§ Adaptive Learning β Improves translation accuracy with community feedback.
- βοΈ Supabase Integration β For cloud sync, user preferences, and dataset management.
Frontend: React.js, Next.js, Tailwind CSS
Backend: FastAPI, Supabase
AI Components: TensorFlow / PyTorch, MediaPipe, Whisper, gTTS
3D Rendering: Three.js or Unity WebGL for avatar gestures
Integrations: Zoom SDK, Google Meet API, Microsoft Teams SDK
Deployment: Docker + Vercel
- Python 3.8+
- Node.js 18+
- PyTorch 2.0+
- MediaPipe 0.10+
cd backend
pip install -r ../requirements.txt
uvicorn app.main:app --reloadnpm install
npm run devOur model is fine-tuned on Ghanaian Sign Language (GSL) with 98.18% validation accuracy and 98.54% overall test accuracy across 1,485 sign classes using the FullI3D architecture.
Training Pipeline:
# Preprocess landmarks (one-time)
python scripts/preprocess_landmarks.py
# Fine-tune with FullI3D architecture (recommended)
python scripts/train_edusign_gsl.py \
--architecture i3d \
--pretrained-model backend/app/models/pretrained_wlasl.pth \
--epochs 50 \
--batch-size 8 \
--base-channels 64 \
--fine-tune-lr 0.0001 \
--augment --oversample \
--loss focal --class-weights \
--scheduler cosine
# Alternative: SimpleI3D (faster training)
python scripts/train_edusign_gsl.py \
--architecture simple \
--pretrained-model backend/app/models/pretrained_wlasl.pth \
--epochs 50 \
--batch-size 16 \
--fine-tune-lr 0.0001
# Test model
python scripts/inference_edusign_gsl.py \
--model backend/app/models/edusign_gsl_finetuned.pth \
--input <frame.jpg>
# Detailed evaluation
python scripts/evaluate_model_detailed.py \
--model backend/app/models/edusign_gsl_finetuned.pth \
--output-dir evaluation_resultsFullI3D Architecture (Current Best):
- Validation Accuracy: 98.18% (best), 97.27% (final)
- Test Accuracy: 98.54%
- Top-5 Accuracy: 100.00%
- Classes: 1,485 GSL signs
- Architecture: FullI3D (3D Convolutions)
- Base Model: WLASL pretrained on 2,000 ASL signs
- Model Size: 46MB
- Training Accuracy: 99.54%
Previous SimpleI3D (Baseline):
- Validation Accuracy: 95.45%
- Architecture: Simplified I3D (LSTM-based)
- Improvement: +2.73% with FullI3D
Our GSL dataset includes:
- 1,525 dictionary entries with sign meanings
- 8,980 validated frames extracted from YouTube videos
- MediaPipe landmarks pre-extracted and cached for fast training
# Extract dictionary from PDF
python scripts/extract_gsl_dictionary.py
# Download YouTube videos
python scripts/download_youtube_sign_videos.py <video_url>
# Extract and validate frames
python scripts/extract_frames_from_videos.py
python scripts/validate_video_frames.pyThe FastAPI backend provides:
- Sign Recognition: Recognizes GSL signs from images/videos
- Sign-to-Speech: Converts recognized signs to speech/text
- Speech-to-Text: Converts speech to text
- Text-to-Sign: Converts text to sign language representation
- Health Checks: System status monitoring
See backend/app/routes/ for endpoint definitions.
EduSign AI promotes digital inclusion by ensuring equal participation for deaf and hearing individuals. It aligns with UN SDG 4 (Quality Education) and SDG 10 (Reduced Inequalities), making education, meetings, and communication accessible to everyone, everywhere.
EduSign AI aims to become a universal AI interpreter that supports:
- Augmented and Virtual Reality classrooms.
- Emotion-based sign interpretation.
- A global open-source sign language dataset (EduSign Corpus).
- Plug-ins for web, mobile, and smart devices.
Cbreve β Innovators passionate about AI, accessibility, and social impact.
Open for academic and non-commercial use. For research or partnership inquiries, please contact the EduSign AI team.
Large raw assets (PDFs, videos, images, models) are intentionally not tracked in Git (see .gitignore). Share via external storage or Git LFS if needed.
EduSign-AI/
βββ backend/ # FastAPI backend (AI & API)
β βββ app/
β β βββ models/ # Trained ML models
β β βββ services/ # Business logic (avatar, sign recognition)
β β βββ routes/ # API endpoints
β βββ data/ # Datasets and processed data
βββ frontend/ # Next.js frontend
β βββ public/
β β βββ avatar_models/ # 3D avatar files (Ready Player Me)
β β βββ ready-player-me/
β β βββ models/ # .glb/.gltf files
β β βββ textures/ # Texture maps
β β βββ animations/ # Sign language animations
β βββ src/ # React components and pages
βββ integrations/ # SDKs for Zoom, Google Meet, Teams
βββ scripts/ # Data processing and training scripts
βββ docs/ # Project documentation