SignAI is an sign language recognition and translation system that uses machine learning to interpret German Sign Language (DGS) in real time and produce gloss-style translations. This repository contains the recognition engine, frontend UI, the product webside, preprocessing & training pipelines, and inference tooling.
Primary languages: Python (core, app), CSS/HTML/JavaScript (frontend for web).
Note: v1.1.0 is the first fully working release. The project is actively developed and some operational aspects (admin privileges, resource requirements) are still being refined. See Known Issues & Roadmap for details.
- Quick links
- Highlights (v1.0.0)
- Requirements
- Installation (end user)
- Quick start (developer / local run)
- Models & AI
- 6.1. Model artifacts and training workflow
- 6.2. Seq2Seq architecture (detailed)
- 6.3. Training visualizations
- Preprocessing
- Usage & tips
- Technical notes & baseline metrics
- Known issues & workarounds
- Roadmap
- Contributing
- License
- Media & acknowledgements
- Contact
- Website / Downloads: https://www.signai.dev/download
- Issues & support: https://github.com/Stefanos0710/SignAI/issues
- Releases & changelog: https://github.com/Stefanos0710/SignAI/releases
- Full repository: https://github.com/Stefanos0710/SignAI
- New DGS model with a vocabulary of 800+ gloss tokens.
- Sentence-level gloss translation for sequences up to 15 tokens.
- Improved finetuning and a faster, more secure inference pipeline.
- Operational and UX improvements for camera handling and startup.
- Baseline training metrics (compressed dataset): training accuracy ≈ 30%, validation ≈ 25%.
- Supported platforms: Windows (primary). macOS, Linux, Android and iOS builds planned.
- Webcam or compatible video input for live recognition.
- Disk space: minimum 5 GB free (models/caches may require more).
- Python 3.8+ (for development and source builds).
- Recommended: GPU for faster inference and training; CPU-only inference is supported but slower.
- Visit https://www.signai.dev/download and download the appropriate installer for your OS.
- Run the installer and follow the on-screen instructions.
- Launch the SignAI application.
Troubleshooting
- If the camera feed does not appear on startup, click the “Switch Camera” button repeatedly until the correct feed appears (the OS or other apps might lock the camera).
- First run may take several seconds while libraries and model files load; please wait for the UI to become responsive.
Security note
- Some operations in this release may require administrator privileges (installation, camera access, certain model management tasks). Future releases will reduce these requirements or provide safer alternatives.
- If a console apears, do not close it, because it would also close the main app, and it shows bugs or more translation details ect.
- Clone the repository:
bash git clone https://github.com/Stefanos0710/SignAI.git - Create & activate a virtual environment:
python -m venv .venv
- Windows:
bash .venv\Scripts\activate - macOS / Linux:
bash source .venv/bin/activate
- Windows:
- Install dependencies:
bash pip install -r requirements.txt - Start the app (development mode):
bash cd appbash python app.py
For training:
- Use
bash python train.pyfor single-word classification orbash python train-seq2seq.pyfor sentence-level training (see Models & AI).
- Model artifacts are stored in
models/(Keras checkpoints, final models, and training history JSON/CSV files). - Training scripts:
train.py— single-word classification training loop.train-seq2seq.py— sequence-to-sequence training for sentence-level gloss translation.
- Typical training flow:
- Run preprocessing to produce feature files (keypoint embeddings or frame features).
- Create TF/PyTorch datasets and dataloaders.
- Build or load a model from
model.py. - Configure augmentation, optimizers, and losses.
- Train with callbacks (ModelCheckpoint, EarlyStopping, CSV/JSON history).
- Save final model and training history.
When changing model architecture, keep checkpoint compatibility in mind (naming conventions or conversion scripts help migration).
This project’s sentence translation uses an encoder–decoder (seq2seq) architecture with additive attention. Summary of the implemented architecture:
- Encoder
- Input: variable-length sequences of per-frame features (shape: batch × time_steps × num_features).
- Masking to ignore padded frames.
- Bidirectional LSTM (returning sequences and forward/backward final states).
- Concatenate forward and backward states to initialize the decoder.
- Decoder
- Token input sequence (previous tokens during training — teacher forcing).
- Embedding layer (mask_zero=True).
- LSTM initialized with concatenated encoder states.
- Attention
- Additive (Bahdanau-style) attention between decoder outputs and encoder outputs to compute a time-dependent context vector.
- Output
- Concatenate decoder output and attention context.
- Dense softmax projection to produce token probabilities over the gloss vocabulary.
Design rationale and training notes are documented in MODEL_ARCHITECTURE.md and the code comments in model.py.
Below are example training history plots and diagnostics:
- Training history (example run — mode 28) with dataset https://www.kaggle.com/datasets/mariusschmidtmengin/phoenixweather2014t-3rd-attempt:
- Training history (example run — model 29) with dataset https://www.kaggle.com/datasets/mariusschmidtmengin/phoenixweather2014t-3rd-attempt:
- Classification training snapshot:
- Key scripts:
preprocessing_train_data.py— prepares training features from raw videos/frames (frame sampling, keypoint extraction, normalization, padding/truncation).preprecessing_livedata_web.py/api/preprocessing_live_data.py— lightweight live preprocessing pipeline for camera / API inputs.
- Data format
- A sequence is a time-ordered array of per-frame feature vectors: (time_steps, num_features).
- Coordinate normalization is recommended (relative to person or frame) to reduce variation.
- Short sequences are zero-padded; long sequences are truncated or sampled to a fixed maximum length.
- Recommended workflow
- Collect raw videos under
data/. - Run
preprocessing_train_data.pyto generate feature files. - Inspect features with
check_dataset.py. - Train with
train.pyortrain-seq2seq.py.
- Collect raw videos under
Example visualization (keypoint & pose preprocessing example using MediaPipe):

- Recording: Press “Record” and perform signs. The output is gloss-style German tokens — not fully grammatical sentences.
- Non-professional signers: Expect variable recognition quality. Casual or atypical signing can drop accuracy substantially.
- Camera feed missing: Press "Switch Camera" until the correct feed appears. Close other apps that may hold the webcam.
- Slow inference: Close other camera-using apps, free CPU/GPU resources, or use a device with a GPU.
- Model: DGS recognition model v1.0.0 with >800 gloss tokens and sentence translation up to 15 tokens.
- Dataset (training baseline): compressed subsets of PHOENIX-Weather-2014T due to local compute limits — this explains lower initial accuracy.
- Baseline metrics:
- Training accuracy: ~30%
- Validation accuracy: ~25%
- These metrics are a starting point; retraining on full datasets, improved preprocessing and larger models are planned.
- Camera feed interference
- Symptom: No camera image or flicker.
- Workaround: Press "Switch Camera" repeatedly; close other apps using the camera.
- Admin privileges required
- Symptom: Installer or app requests elevated permissions.
- Note: This release may need admin access for certain tasks. Reductions to this requirement are planned.
- First-run delay
- Symptom: Blank UI or delayed stream on first launch.
- Cause: Libraries and models are loading from disk.
- Workaround: Wait a few seconds for the initial load.
- Limited accuracy for casual signers
- Symptom: Low recognition quality for non-professional or out-of-distribution signers.
- Note: Addressed in future training/augmentation plans.
Planned next steps and goals:
- Improve accuracy substantially (target: 3x improvement over v1.0.0) by:
- Training on full (non-compressed) datasets.
- Moving training to larger compute (cloud / supercomputers).
- Combining multiple datasets and adding synthetic augmentation.
- Exploring transformer-based architectures and stronger preprocessing.
- Expand vocabulary coverage (thousands of gloss tokens over time).
- Reduce admin-access requirements and harden camera handling.
- Add natural language rendering (convert glosses to grammatical sentences) and multilingual support (ASL planned).
We welcome contributions:
- Star the repo.
- Fork and create a branch:
bash git checkout -b feat/my-change - Add tests and documentation for changes.
- Run the test suite and linters.
- Open a Pull Request with a clear description, test instructions and any migration notes.
Please avoid committing large model binaries — use release assets or external model hosting.
See the LICENSE file in the repository root. The project currently uses a non-commercial license; contact the maintainers if you require a different arrangement.
- 🥈 2nd Place
- Jugend forscht 2024/2025: Featured coverage in Süddeutsche Zeitung and several local/regional outlets.
- General / partnerships / press / collaborations: [email protected] — I'm open to collabs, creative projects, and partnerships.
- Support / troubleshooting: [email protected] — preferred: open an issue first at GitHub Issues with reproduction steps and logs.