AI-powered image captioning using InceptionV3+LSTM and ViT-GPT2 models. Trained on Flickr8k dataset with interactive Streamlit interface.
-
Updated
Oct 27, 2025 - Jupyter Notebook
AI-powered image captioning using InceptionV3+LSTM and ViT-GPT2 models. Trained on Flickr8k dataset with interactive Streamlit interface.
The chrome extension that gets input images and generates the captions for them.
Developed an image captioning system using the BLIP model to generate detailed, context-aware captions. Achieved an average BLEU score of 0.72, providing rich descriptions that enhance accessibility and inclusivity.
Flask-based AI app that summarizes surveillance videos using Whisper (audio), ViT-GPT2 (frame captions), and Groq LLM (narratives). Produces both general and law enforcement-style summaries.
A powerful Streamlit application that analyzes images using multiple vision models and responds to queries about visual content through conversational AI.
An AI-powered image captioning app built with Streamlit, using ViT-GPT2 for caption generation and YOLOv8 for object detection. The app provides enhanced captions by integrating detected objects into the generated text.
Add a description, image, and links to the vit-gpt2 topic page so that developers can more easily learn about it.
To associate your repository with the vit-gpt2 topic, visit your repo's landing page and select "manage topics."