Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
78 views8 pages

Project

This document discusses developing an end-to-end neural network approach for captioning complex images. The core problem areas are computer vision and generating detailed descriptions of entire scenes from images. Existing methods struggle with detecting multiple overlapping objects and labeling complex, dense images. The proposed approach uses a convolutional network to localize regions of interest, with a recurrent neural network language model to generate natural language captions describing each region and the full scene. The goal is an architecture that jointly performs localization and generation of descriptive label sequences for complex images.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
78 views8 pages

Project

This document discusses developing an end-to-end neural network approach for captioning complex images. The core problem areas are computer vision and generating detailed descriptions of entire scenes from images. Existing methods struggle with detecting multiple overlapping objects and labeling complex, dense images. The proposed approach uses a convolutional network to localize regions of interest, with a recurrent neural network language model to generate natural language captions describing each region and the full scene. The goal is an architecture that jointly performs localization and generation of descriptive label sequences for complex images.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 8

AN END-TO-END TRAINING BASED APPROACH FOR CAPTIONING

COMPLEX IMAGES USING NEURAL NETWORKS

By

Shreyas V Kashyap - 1BG13CS097


Swaroop Gupta B.A -1BG13CS113
Spandana Rao B.A - 1BG13CS103

Under the Guidance of


Smt.Jebah Jaykumar
AREA OF RESEARCH

We have hereby chose the area of research to be a combination of the following


research extensive areas.

Machine Learning

Image Processing
CORE AREA OF PROBLEM

Our core area of problem is

Computer Vision

Program a computer to "understand" a scene or features in an image.

Concerned with the automatic extraction, analysis and understanding of useful


information from a single image or a sequence of images
RELEVANCE OF THE PROBLEM

Object detection

Currently the existing state of the art methods can detect a single object or
multiple non overlapping objects

This makes the detection useless for any analysis of an entire scene

Object labelling

The labelling of single prominent object in an image

Unable to describe a scene in a complex dense detailed image


APPLICATION OF THE PROBLEM

Currently the problem persists in the following applications of Computer Vision

Image search

The existing image search works only on the file name of an image, and not on the
details of the scene
This should be overcome and the search should happen only on the basis of what is
there in the image, instead of the filename

Image scene detection

The existing Image detection, detects a single prominent part of the image and
cannot detect if there are variations of viewpoint.
This should be overcome and multiple objects need to be detected and the whole
scene has to be described.
PROBLEM STATEMENT

Our ability to effortlessly describe all aspects of an image relies on a strong semantic
understanding of a visual scene and all of its elements. However, despite numerous
potential applications, this ability remains a challenge for our state of the art visual
recognition systems

Our goal is to design an architecture that jointly localizes regions of interest and the
describes each with natural language

Architecture is composed of a Convolutional Network, an efficient dense localization


layer, and Recurrent Neural Network language model that generates the label
sequences for the complex images.
INPUT / OUTPUT EXAMPLE

Sample Input :

Output of existing System :

Output Expected :
THANK YOU

You might also like