I’m a researcher working at the intersection of computer vision, natural language processing, and multimodal machine learning. My work focuses on building systems that are inclusive, accessible, and impactful, especially in communication and healthcare domains.
My recent research spans:
- 🤟 Sign Language Translation (SLT) — Indian Sign Language dataset curation, video–language modeling
- 🎞️ Multimodal & Motion-Aware Models — vision–language models, temporal modeling, contrastive alignment
- 🧩 Multi-View Clustering & Representation Learning
- 🩺 Medical Image Analysis — segmentation, classification, multimodal fusion
- ⚡ Efficient Training — LoRA/QLoRA, feature caching, GPU-optimized pipelines
- Multimodal Machine Learning
- Sign Language Processing
- Video–Language Models
- Multi-View Learning
- Medical Imaging
- Representation Learning
- Low-bit / Efficient Fine-tuning
- Building motion-aware video–language models for Indian Sign Language translation
- Designing contrastive pretraining pipelines for aligning video and text embeddings
- Curating high-quality datasets for SLT
- Exploring efficient training strategies for large vision–language models
Python · C · Shell
PyTorch · Hugging Face Transformers · Lightning · OpenCV · scikit-learn
Decord · MMCV · FFmpeg
WandB · tmux · LaTeX