The product is designed for disabled people, who cannot see the pictures but can listen to the description. We are aiming to enable disabled people to access more information and experience the beauty of the world. Our product can predict the captioning of an image and display to users. A user can upload an image and play the audio of the captioning of the image, so that they can "hear" the image.
We have used Image Captioning model in the backend. An Image Captioning model helpt the application to take an image as input and produce a short textual summary describing the content. It uses both Computer Vision and Natural Language Processing to generate the captions. The model is implementing an Encoder-Decoder architecture. It encodes images to a high-level representation by Convolutional Neural Network (CNN) and then decodes this representation using an NLP algorithm, Recurrent Neural Network(RNN). In contrast to traditional models, we have also included Attention that helped the model to focus on the most relevant pixels of the image to produce the captions.
Home page:
When you upload an image:
demo.mov
-
Clone this repo
-
Download my trained model and put it in a new "models" folder within the repo directory
-
make install
-
Open Python Shell and run:
import spacy
from spacy.cli.download import download download(model="en_core_web_sm")
-
Run app.py
Download the Image folder from the above dataset, zip the "Image" folder and put it in the current Colab working directory.

