0% found this document useful (0 votes)

7 views4 pages

Paper 3

Uploaded by

Ishika Kale

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views4 pages

Paper 3

Uploaded by

Ishika Kale

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Special Issue - 2017 International Journal of Engineering Research & Technology (IJERT)

ISSN: 2278-0181
ICIATE - 2017 Conference Proceedings

Text Recognition and Extraction from Video

Kiran Agre Ankur Chheda
B.E. Student of Computer branch B.E. Student of Computer branch
Atharva College of Engineering Atharva College of Engineering
Mumbai, India Mumbai, India

Sairaj Gaonkar Prof. Mahendra Patil

B.E. Student of Computer branch Head of Computer Department
Atharva College of Engineering Atharva College of Engineering
Mumbai, India Mumbai, India

Abstract— Videos have become a great source of The working of the proposed system is very simple. User
information. The text in the video contains huge amount of downloads the video form the YouTube or any other website
information and data. But this information is not in editable from which he wants to extract text. This video is provided as
form. If this text is converted to an editable form it becomes input to the proposed system. Proposed system converts video
simpler and efficient to store useful information. The paper into series of frames and applies text detection and extraction
describes the technique that aims at extraction of the text which on each frame. The detected text from each frame is stored in
occurs is video. The main focus of the proposed system is on text file.
educational and news video. The user will have to provide the
video as input from which he wants to extract text. The system II. LITERATURE REVIEW
will process the video and generate the text output in editable Datong Chen, Jean-Marc Odobez [1] have proposed the
text file. system that minimizes character error rates and also removes
noise from the character that greatly disturb the optical
Keywords— Frames, Text Recognition, MESR, OCR, Gray
character recognition.
Scale, MPEG.
Mati Pietikainem , Oleg Okun [2] have proposed combined
I. INTRODUCTION edge based text detection that minimizes degradation in
With the rapid advancement in technology and the increasing extracted text and can work with images having complex
speed of internet, the focus of the people is shifting from background.
Television to YouTube. The main advantage of YouTube C. P. Sumati , N. Priya [3] have proposed combined edge
over television is that YouTube provides shows at user’s based method [2].This method is sensitive to skew and text
preference irrespective of time. Television has programs that orientation.
are shown at a particular fixed time which creates time Z. Cennekove, C. Nikou, I. Pitas [4] have proposed the
constrain for users. As the focus is shifting towards YouTube system that uses entropy based metrics. It involves checking
the paper has proposed system which makes it easy for user color histogram for each frame against the histogram of the
to access information contained by the text in these video in next consecutive frame. This method fails when two different
efficient and quicker way. The proposed system will convert images having exactly same color histogram values.
the text in the video into editable form which is stored in a Priti Rege, Chanchal Chandrakar [5] has explained text image
text file. separation in document images using boundary/perimeter.
YouTube is used widely used for news and educational Text detection is performed using sobel operator and
videos. These videos contain text which adds information to thresholding. As text enhancement is not been used the
videos and makes it more meaningful. If the text from the extracted text can be noisy.
videos is converted to editable form, it can be stored Arvind, Mohamed Rafi [6] have explained text extraction
efficiently and it will be easier to access it next time. Once using connected component based method. The prerequisite
the user has watched the educational video, next time he may for this method is that, text should have more contrast
not want to go through the entire video as he has already compare to its background.
watched it and reading the main points may be sufficient for Lifang Gu [7] explained text detection in MPEG (Moving
him to revise the topic from that video. In such case the Picture Experts Group) video frames. It reduces spatial and
proposed system helps user to get access to the information temporal data redundancies. This method is only applicable
by converting to text in video to editable form. The editable to MPEG videos.
from is a text file format. The main advantage of text file Baseem Bouaziz, Tarek Zlitni, Walid Mahdi [8] explained
format is that it requires very small size as compared to the automatic video text extraction. It performs content based
size of a video. Also the information in text can be edited if it video indexing. This method can detects only static
changes in future or if user wants to add any additional superimposed text.
information in it which is not possible in case of video.

Volume 5, Issue 01 Published by, www.ijert.org 1

Special Issue - 2017 International Journal of Engineering Research & Technology (IJERT)
ISSN: 2278-0181
ICIATE - 2017 Conference Proceedings

Punit Kumar, P. S. Puttaswamy [9] have proposed the system 3.1 Sliding Window based method:
that performs area based filtering to eliminate noise blobs This method uses sliding window to search for a specific text.
present in the image. This method fails when background has It starts by taking small rectangular patch of the given image.
greater intensity transitions. This rectangular patch is of specific dimension. This
III. PROBLEM STATEMENT rectangular patch is slide over the entire area cover by the
There are two types of text occurring in a video. image to check whether or not there is text in that image
 Natural text. patch. Different sliding window classifiers are used to decide
if there is text in the patch. The window is initially placed at
 Superimposed text.
the leftmost top corner of the image and slides over the
a) Natural/scene text: Natural text is the text which occurs in different locations of the image starting with the first row and
the video when it is being recorded. These texts are part of then going in the further rows of the image. This method is
scene where video is recorded. Example: House number, Car slow as image has to process in multiple scales. Even if the
plate number. text is present at the bottom of the image window has to start
from the top of the image. Also the accuracy of the detected
text is depends on the dimensions of the window.

3.2 Connected Component based method:

In connected component based approach first we extract pixel
regions which have similar color, edge strength or texture and
evaluate each one of them for being text or non-text using
machine learning techniques.
Connected component based method is efficient for caption
text with plain background images but it doesn’t works well
for images with clustered background.

V. COMPARISION WITH PROPOSED SYSTEM

Unlike previous systems which showed the detected text in
Figure 1 Natural Text
the video frame the proposed system will store the text output
in a separate text file. The advantage of this feature is that the
b) Superimposed text: Superimposed text is the text which is
algorithm need not run every time the video is played. The
not part of video when it is recorded but is superimposed to
proposed system does not compares consecutive frames for
give extra information about that particular scene. Example:
detection of text region which the system proposed by Z.
Text occurring in News Video.
Cennekove, C. Nikou[4] as it may assume any new object
introduced in successive frame as text. The proposed system
is able to detect text even if there are two sentences with
different font size.
VI. SYSTEM OVERVIEW
The proposed system has three main components:
 Frame Generation
 Text Recognition and Extraction
 Text File Generation
4.1 Frame Generation: In this step, the video is converted
into frames. Frames are the images of a particular time of a
video. At regular time interval the frames are generated so
Figure 2 Superimposed text that text in the successive frame is not repeated very often.
These frames can be saved in any image format.
Natural text is of not great use as it contains information of
The user will have two options while converting video to
less significance but superimposed text contains information
frame. The first is convert entire video and second is
which is of great importance. Hence the main aim of
converting a selected portion of video. When users want text
proposed system is to detect superimposed text occurring in
from entire video they will select first option. If users want
the video.
text from only from a particular time frame, they will select
IV. EXISTING SYSTEMS AND THEIR GAPS the second option where user will be able to select start time
There are many systems developed to detect the text in video. and end time for text extraction. The selected portion of video
Each system is based on a particular method and has a will converted to images which will be stored in a separate
drawback associated with it. folder for easy access while applying text extraction
Some of the commonly used methods to detect text are algorithm on it.
 Sliding Window based method
 Connected Component based method

Volume 5, Issue 01 Published by, www.ijert.org 2

Special Issue - 2017 International Journal of Engineering Research & Technology (IJERT)
ISSN: 2278-0181
ICIATE - 2017 Conference Proceedings

4.2 Text Recognition and Extraction: This step is applied on Step-3 Merge Text Regions for Final Detection: All the
every frame. In this step the text region is detected using detection results are composed of individual text characters.
algorithm described in the next part. The detected text regions To use this result for recognition task, the individual text
are then refined to increase the efficiency of extracting text. characters must be merged into words. This enables the
Text Extraction algorithm is the applied to the detected recognition of actual words in an image.
regions. The efficiency of detecting text depends on font
color, text size, background color and resolution of the video
[9].

Figure 4 Text Recognition and Extraction Algorithm

Step-4 Recognize Detected Text Using OCR: After detecting

the text regions, the OCR method such as edge based method
is used to recognize the text. The detected text regions are
then refined to increase the efficiency of extracting text.
VIII. FUTURE SCOPE
Figure 3 System Overview The proposed system can be enhanced by allowing the user
to select a particular portion of the screen and then only the
4.3 Text File Generation: The extracted text is stored in a text text occurring in that part will be extracted. This will be
file. For every frame the generated text is appended to the useful in situation where user wants text from only a
previous text in the text file and stored. At the end of particular region. Another enhancement that can be made is
extracting text from all the images the path of the output file that instead of providing the video as input the user can
will be given to user. Size of the text file is very less as directly provide the URL of the video and the system will
compared to the size of the video. This saves memory and auto download video and extract text from it.
also makes quicker access to information possible.
IX. CONCLUSION
VII. TEXT RECOGNITION AND EXTRACTION In this paper we discussed our proposed method of
ALGORITHM detecting and extracting text from the video. The system
automates the manual process of extracting text from videos
Step-1 Text Region Recognition: The MSER (Maximally
and hence is economical in terms of time and human efforts.
Stable External Region) algorithm is used to detect candidate
The system will be implemented in MATLAB language. The
text region from the given image. MSER first converts the system can be mainly used for educational and news videos
color image into gray scale image. It selects the regions which contain information in form of text.
which stay in the range of given threshold. All the pixels
above or equal to a given threshold are black and all the
pixels below given threshold are white.

Step-2 Removal of Non Text Region: The MSER may also

detect non text regions. Stroke width is used to discriminate
between text and non-text regions. Stroke width is a measure
of the width of the curves and lines in the characters. Text
region will have little stroke width variations whereas not text
region will have larger variations.

Volume 5, Issue 01 Published by, www.ijert.org 3

Special Issue - 2017 International Journal of Engineering Research & Technology (IJERT)
ISSN: 2278-0181
ICIATE - 2017 Conference Proceedings

X. REFERENCES
[1] Datong Chen, Jean-Marc Odobez. “Text detection and
recognition in images and video frames” The Journal of The
Pattern Recognition Society, 2004 pages 595-608.
[2] Matti Pietikainen and Oleg Okun. "Text extraction from grey
scale page images by simple edge detectors" Machine Vision
and Intelligent Systems Group.
[3] C.P.Sumathi, N.Priya "A Combined Edge-Based Text Region
Extraction from Document Images" International Journal of
Advanced Research in Computer Science and Software
Engineering Volume 3, Issue 8, August 2013 ISSN: 2277
128X.
[4] Cerenkov, Z Greece Nikou, C. Pitas, I."Shot detection in video
sequences using entropy based metrics" Proceedings.
International Conference on Volume: 3 2002.
[5] Priti P. Rege Chanchal A. Chandrakar "Text-Image Separation
in Document Images Using Boundary Perimeter Detection"
ACEEE Int. 1. On Signal & Image Processing, Vol. 03, No. 01,
Jan 2012.
[6] Arvind, Mohamed Rafi "Text Extraction from Images Using
Connected Component Method" Journal of Artificial
Intelligence Research & Advances Volume 1, Issue 2, 2014.
[7] Lifang Gu "Text Detection and Extraction in MPEG Video
Sequences" In Proceedings of the International Workshop on
Content-Based Multimedia Indexing, 2001 pages 233-240.
[8] Punit Kumar, P. S. Puttaswamy “VIDEO TO FRAME
CONVERSION OF TV NEWS VIDEO BY USING
MATLAB”. IJARSE, Vol. No.3, Issue No.3, March 2014.
[9] Punit Kumar, P. S. Puttaswamy “Moving text line detection
and extraction in TV video frames”.IEEE International
Advance Computing Conference (IACC) 2015.

Volume 5, Issue 01 Published by, www.ijert.org 4

TQ Agile and Devops
83% (6)
TQ Agile and Devops
15 pages
Operating System Question Bank 2023
100% (4)
Operating System Question Bank 2023
6 pages
Visapp Vocr
No ratings yet
Visapp Vocr
6 pages
Wiamis 04
No ratings yet
Wiamis 04
4 pages
Arabic Text Recognition in Video Sequences: Mohamed Ben Halima, Hichem Karray and Adel M. Alimi
No ratings yet
Arabic Text Recognition in Video Sequences: Mohamed Ben Halima, Hichem Karray and Adel M. Alimi
6 pages
Text Extraction in Video: Ankur Srivastava, Dhananjay Kumar, Om Prakash Gupta, Amit Maurya, MR - Sanjay Kumar Srivastava
No ratings yet
Text Extraction in Video: Ankur Srivastava, Dhananjay Kumar, Om Prakash Gupta, Amit Maurya, MR - Sanjay Kumar Srivastava
6 pages
Video Text WACV
No ratings yet
Video Text WACV
8 pages
Implementation of A Video Text Detection System
No ratings yet
Implementation of A Video Text Detection System
5 pages
JournalNX - Textual Content Video Stream
No ratings yet
JournalNX - Textual Content Video Stream
5 pages
Extraction Text From Camera Images
No ratings yet
Extraction Text From Camera Images
14 pages
Localizing Text On Videos
No ratings yet
Localizing Text On Videos
13 pages
7sem Project Report
No ratings yet
7sem Project Report
27 pages
Arabic Character Extraction and Recognition Using Traversing Approach
No ratings yet
Arabic Character Extraction and Recognition Using Traversing Approach
9 pages
Multi-Dimensional Long Short-Term Memory Networks For Artificial Arabic Text Recognition in News Video
No ratings yet
Multi-Dimensional Long Short-Term Memory Networks For Artificial Arabic Text Recognition in News Video
10 pages
Multi-Script-Oriented Text Detection and Recognition in Video/Scene/Born Digital Images
No ratings yet
Multi-Script-Oriented Text Detection and Recognition in Video/Scene/Born Digital Images
18 pages
A Video Indexing System Using Character Recognition: For of of
No ratings yet
A Video Indexing System Using Character Recognition: For of of
2 pages
Video Text Classification Method
No ratings yet
Video Text Classification Method
6 pages
Paper - 3
No ratings yet
Paper - 3
33 pages
Paper 22-NF-SAVO Neuro-Fuzzy System For Arabic Video OCR
No ratings yet
Paper 22-NF-SAVO Neuro-Fuzzy System For Arabic Video OCR
9 pages
Multi-Dimensional LSTM
No ratings yet
Multi-Dimensional LSTM
10 pages
Ote-Ocr Based Text Recognition and Extraction From Video Frames
No ratings yet
Ote-Ocr Based Text Recognition and Extraction From Video Frames
4 pages
Ijarcce 38
No ratings yet
Ijarcce 38
5 pages
Temporal Integration For Word-Wise Caption and Scene Text Identification
No ratings yet
Temporal Integration For Word-Wise Caption and Scene Text Identification
6 pages
Scene Text Detection Using Machine Learning Classifiers
No ratings yet
Scene Text Detection Using Machine Learning Classifiers
5 pages
IJERT Segmentation and Detection of Text
No ratings yet
IJERT Segmentation and Detection of Text
6 pages
Improving Multimedia Retrieval With A Video OCR
No ratings yet
Improving Multimedia Retrieval With A Video OCR
12 pages
Yerrijdnewpaper
No ratings yet
Yerrijdnewpaper
5 pages
Title: Spatial Cohesion Refers To The Fact That Text
No ratings yet
Title: Spatial Cohesion Refers To The Fact That Text
6 pages
Mca1414garbybaby 170131175855
No ratings yet
Mca1414garbybaby 170131175855
44 pages
Enhanced Generic Video Summarizationusing Large Scale Categorization
No ratings yet
Enhanced Generic Video Summarizationusing Large Scale Categorization
8 pages
Turki2016 AICCSA
No ratings yet
Turki2016 AICCSA
6 pages
Extraction of Overlayed Text From TV Video Sequences
No ratings yet
Extraction of Overlayed Text From TV Video Sequences
9 pages
A Robust and Fast Text Extraction in Images and Video Frames
No ratings yet
A Robust and Fast Text Extraction in Images and Video Frames
7 pages
Miriam Leon, Veronica Vilaplana, Antoni Gasull, Ferran Marques (Veronica - Vilaplana, Antoni - Gasull, Ferran - Marques) @upc - Edu
No ratings yet
Miriam Leon, Veronica Vilaplana, Antoni Gasull, Ferran Marques (Veronica - Vilaplana, Antoni - Gasull, Ferran - Marques) @upc - Edu
4 pages
Text To Speech
No ratings yet
Text To Speech
9 pages
Arabic Multimedia Search Platform
No ratings yet
Arabic Multimedia Search Platform
13 pages
Detection and Identification of Un-Uniformed Shape Text From Blurred Video Frames
No ratings yet
Detection and Identification of Un-Uniformed Shape Text From Blurred Video Frames
11 pages
DSP Project
No ratings yet
DSP Project
16 pages
Comarison PDF
No ratings yet
Comarison PDF
16 pages
Smartsubtitling Paper-Ver3
No ratings yet
Smartsubtitling Paper-Ver3
5 pages
Text Color Images
No ratings yet
Text Color Images
6 pages
Irjet V11i617
No ratings yet
Irjet V11i617
7 pages
GoK2014 4
No ratings yet
GoK2014 4
6 pages
Ijecet: International Journal of Electronics and Communication Engineering & Technology (Ijecet)
No ratings yet
Ijecet: International Journal of Electronics and Communication Engineering & Technology (Ijecet)
8 pages
Text Detection in Scene Images
No ratings yet
Text Detection in Scene Images
4 pages
6 Report
No ratings yet
6 Report
5 pages
Youtube Transcript Summarizer Using Flask
No ratings yet
Youtube Transcript Summarizer Using Flask
9 pages
Paper 1
No ratings yet
Paper 1
13 pages
The State of The Art in Image and Video Retrieval
No ratings yet
The State of The Art in Image and Video Retrieval
7 pages
Language Synthesis Using Image Placement
No ratings yet
Language Synthesis Using Image Placement
9 pages
(IJCST-V12I3P20) :bassant Mohamed Elamir, Amany Fawzy Elgamal, Marwa Hussein Abdelfattah
No ratings yet
(IJCST-V12I3P20) :bassant Mohamed Elamir, Amany Fawzy Elgamal, Marwa Hussein Abdelfattah
17 pages
Text Detection and Localization in Low Quality Video Images Through Image Resolution Enhancement Technique
No ratings yet
Text Detection and Localization in Low Quality Video Images Through Image Resolution Enhancement Technique
5 pages
RP 0215 5826
No ratings yet
RP 0215 5826
3 pages
Detection of Text From Lecture Video Images
No ratings yet
Detection of Text From Lecture Video Images
5 pages
Project Report
No ratings yet
Project Report
38 pages
Video Text Extraction Solutions
No ratings yet
Video Text Extraction Solutions
1 page
Sensors: A Semantic Autonomous Video Surveillance System For Dense Camera Networks in Smart Cities
No ratings yet
Sensors: A Semantic Autonomous Video Surveillance System For Dense Camera Networks in Smart Cities
23 pages
Char RCG TH
No ratings yet
Char RCG TH
11 pages
Methodology For Eliminating Plain Regions From Captured Images
No ratings yet
Methodology For Eliminating Plain Regions From Captured Images
13 pages
Cam2Pdf .
No ratings yet
Cam2Pdf .
6 pages
Text Extraction in Complex Images
No ratings yet
Text Extraction in Complex Images
8 pages
Excel Module 1 PPT Presentation
No ratings yet
Excel Module 1 PPT Presentation
31 pages
AUSTRALIA - Mounting Systems Guide PDF
100% (1)
AUSTRALIA - Mounting Systems Guide PDF
20 pages
Portable Power for Disaster Relief
No ratings yet
Portable Power for Disaster Relief
12 pages
History of 4004
No ratings yet
History of 4004
11 pages
SKF KM 8 Specification
No ratings yet
SKF KM 8 Specification
3 pages
Schunk Products For Wind Turbines: Schunk Kohlenstofftechnik GMBH Schunk Bahn-Und Industrietechnik GMBH
No ratings yet
Schunk Products For Wind Turbines: Schunk Kohlenstofftechnik GMBH Schunk Bahn-Und Industrietechnik GMBH
4 pages
BIS - Med
No ratings yet
BIS - Med
128 pages
DAP-3662 A1 Manual v1.10 (WW)
No ratings yet
DAP-3662 A1 Manual v1.10 (WW)
95 pages
FJP13007 High Voltage Fast-Switching NPN Power Transistor: Features
No ratings yet
FJP13007 High Voltage Fast-Switching NPN Power Transistor: Features
6 pages
Matsit Cleanup Presentation
No ratings yet
Matsit Cleanup Presentation
10 pages
04 - Beam WobbleTest
No ratings yet
04 - Beam WobbleTest
7 pages
Microstrip & SAW Filters Overview
No ratings yet
Microstrip & SAW Filters Overview
23 pages
CC Hospital Network Design
No ratings yet
CC Hospital Network Design
12 pages
Crop 4679 Stics of The Agricultural Environment Using Various Feature Selection Techniques and Classifiers
No ratings yet
Crop 4679 Stics of The Agricultural Environment Using Various Feature Selection Techniques and Classifiers
6 pages
An Optimum Construction Strategy For Multi-Story Residential Prefabricated Modular Buildings
No ratings yet
An Optimum Construction Strategy For Multi-Story Residential Prefabricated Modular Buildings
12 pages
ECE 303 and 353 - LAB - Manual
No ratings yet
ECE 303 and 353 - LAB - Manual
30 pages
ALC269
No ratings yet
ALC269
79 pages
Project Crashing in Construction Management
No ratings yet
Project Crashing in Construction Management
22 pages
1st Sessional Date Sheet
No ratings yet
1st Sessional Date Sheet
4 pages
WWW Wattpad
No ratings yet
WWW Wattpad
2 pages
Vertical Steam Engines for Biomass
No ratings yet
Vertical Steam Engines for Biomass
5 pages
Cooler Master G500 Gold Review
No ratings yet
Cooler Master G500 Gold Review
17 pages
Dental Suction System Guide
No ratings yet
Dental Suction System Guide
5 pages
ZYXEL Application-Note PMG1005-T20B 1.0
No ratings yet
ZYXEL Application-Note PMG1005-T20B 1.0
22 pages
Smart Classroom Tech for Educators
No ratings yet
Smart Classroom Tech for Educators
4 pages
Course Information Sheet (Theory Based Course)
No ratings yet
Course Information Sheet (Theory Based Course)
5 pages
Lec 04 Peripheral Devices
No ratings yet
Lec 04 Peripheral Devices
17 pages

Paper 3

Uploaded by

Paper 3

Uploaded by

Special Issue - 2017 International Journal of Engineering Research & Technology (IJERT)

Text Recognition and Extraction from Video

Sairaj Gaonkar Prof. Mahendra Patil

Volume 5, Issue 01 Published by, www.ijert.org 1

3.2 Connected Component based method:

V. COMPARISION WITH PROPOSED SYSTEM

Volume 5, Issue 01 Published by, www.ijert.org 2

Figure 4 Text Recognition and Extraction Algorithm

Step-4 Recognize Detected Text Using OCR: After detecting

Step-2 Removal of Non Text Region: The MSER may also

Volume 5, Issue 01 Published by, www.ijert.org 3

Volume 5, Issue 01 Published by, www.ijert.org 4

You might also like