Thanks to visit codestin.com
Credit goes to github.com

Skip to content

bkenan/image_captioning_attention

Repository files navigation

Image Captioning including Attention

Background

The product is designed for disabled people, who cannot see the pictures but can listen to the description. We are aiming to enable disabled people to access more information and experience the beauty of the world. Our product can predict the captioning of an image and display to users. A user can upload an image and play the audio of the captioning of the image, so that they can "hear" the image.

Modeling

We have used Image Captioning model in the backend. An Image Captioning model helpt the application to take an image as input and produce a short textual summary describing the content. It uses both Computer Vision and Natural Language Processing to generate the captions. The model is implementing an Encoder-Decoder architecture. It encodes images to a high-level representation by Convolutional Neural Network (CNN) and then decodes this representation using an NLP algorithm, Recurrent Neural Network(RNN). In contrast to traditional models, we have also included Attention that helped the model to focus on the most relevant pixels of the image to produce the captions.

Dataset

Flickr8k dataset from Kaggle

Application pages

Home page:

image

When you upload an image:

test

Demo:

demo.mov

Getting started in the local machine:

  1. Clone this repo

  2. Download my trained model and put it in a new "models" folder within the repo directory

  3. make install

  4. Open Python Shell and run:

    import spacy

    from spacy.cli.download import download download(model="en_core_web_sm")

  5. Run app.py

Testing the model in Colab:

My Colab notebook

Download the Image folder from the above dataset, zip the "Image" folder and put it in the current Colab working directory.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors