VISVESVARAYA TECHNOLOGICAL UNIVERSITY
“Jnana Sangama”, Belagavi - 590018
TECHNICAL SEMINAR REPORT
ON
“STUDY OF INFORMATION EXTRACTION FROM UNSTRUCTURED AND
MULTIDIMENSIONAL BIG DATA”
Submitted in the partial fulfillment of the requirements for the award of the degree of
BACHELOR OF ENGINEERING
In
COMPUTER SCIENCE & ENGINEERING
By
AKHILA.V (1RR16CS006)
Under the guidance of
Dr. KAMAL RAJ T
Assistant Professor,
Dept. of CSE, RRCE
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
RAJARAJESWARI COLLEGE OF ENGINEERING
MYSORE ROAD, BENGALURU-560074
(An ISO 9001:2008 Certified Institute)
(2019-20)
RAJARAJESWARI COLLEGE OF ENGINEERING
MYSORE ROAD, BENGALURU-560074
(An ISO 9001:2008 Certified Institute)
(Affiliated to Visvesvaraya Technological University, Belagavi)
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
CERTIFICATE
This is to certify that the Technical Seminar work entitled “STUDY OF INFORMATION
EXTRACTION FROM UNSTRUCTURED AND MULTIDIMENSIONAL BIG DATA” carried out by
AKHILA V (1RR16CS006) is bonafied student of RajaRajeswari College of Engineering in partial
fulfillment for the award of Bachelor of Engineering in Computer Science and Engineering of
the Visvesvaraya Technological University, Belgavi during the year 2019-2020. It is certified that
all corrections/suggestions indicated for Internal Assessment have been incorporated in the
report. The technical seminar report has been approved as it satisfies the academic
requirements in respect of the technical Seminar work prescribed for the said degree.
………………………….. ……………………………… ………………………………
Signature of Guide Signature of Seminar Coordinator Signature of HOD
[Dr. Kamal Raj T] [Prof. Devi T] [Dr. S. Usha]
Asst. Professor, Asst. Professor, Prof. & HOD, Dean Research
Dept. of CSE, Dept. of CSE, Dept. of CSE,
RRCE, Bengaluru. RRCE, Bengaluru. RRCE, Bengaluru.
ABSTRACT
Process of information extraction (IE) is used to extract useful information from unstructured
or semi-structured data. Big data arise new challenges for IE techniques with the rapid
growth of multifaceted also called as multidimensional unstructured data. Traditional IE
systems are inefficient to deal with this huge deluge of unstructured big data. The volume and
variety of big data demand to improve the computational capabilities of these IE systems. It is
necessary to understand the competency and limitations of the existing IE techniques related
to data pre-processing, data extraction and transformation, and representations for huge
volumes of multidimensional unstructured data. Numerous studies have been conducted on
IE, addressing the challenges and issues for different data types such as text, image, audio
and video. Very limited consolidated research work have been conducted to investigate the
task-dependent and task-independent limitations of IE covering all data types in a single
study. This work addresses this limitation and presents a systematic literature review of state-
of-the-art techniques for a variety of big data, consolidating all data types. Recent challenges
of IE are also identified and summarized. Potential solutions are proposed giving future
research directions in big data IE. The research is significant in terms of recent trends and
challenges related to big data analytics. The outcome of the research and recommendations
will help to improve the big data analytics by making it more productive.
(i)
ACKNOWLEDGEMENT
I express my deep sense and gratitude to Dr. T. Chandrasekar, Prof. and Principal,
RajaRajeswari College of Engineering, Bengaluru, for providing us an opportunity to
undertake this technical seminar work and for his invaluable guidance and help.
I would thank Dr. S. Usha, Prof. and Head, Dean Research, Department of Computer
Science & Engineering, RajaRajeswari College of Engineering for her constant support and
motivation that inspired us to enthusiastically complete our seminar.
I would also thank my Seminar coordinator Prof. Devi T, Assistant Professor, RajaRajeswari
College of Engineering, Bengaluru and Seminar Guide Dr. Kamal Raj T, Assistant
Professor, Department of Computer Science & Engineering, RajaRajeswari College of
Engineering, Bengaluru and all the staff members of the department for their co-operation
and guidance which was very helpful in successful completion of this seminar work.
I am very thankful to my friends and to those who supported me directly or indirectly in the
completion of the seminar work.
And finally, I am very much thankful to God and my parents for their moral support and
inspiration, without which this seminar work would not be completed successfully.
AKHILA V
(1RR16CS006)
(ii)
CONTENTS
Chapter Page No
ABSTRACT………………………………………………………………...............i
ACKNOWLEDGEMENT…………………………………………………............ii
Chapter 1: INTRODUCTION……………………………………………………..01
Chapter 2: LITERATURE SURVEY…………………………………………….02
Chapter 3: IMPLEMENTATION……………………………………....................08
3.1 INFORMATION EXTRACTION FROM TEXT…………………..08
3.2 INFORMATION EXTRACTION FROM IMAGES……………….10
3.3 INFORMATION EXTRACTION FROM AUDIO…………………13
3.4 INFORMATION EXTRACTION FROM VIDEO…………………16
Chapter 4: RESULTS AND DISCUSSION………………………………………20
CONCLUSION……………………………………………………………………23
FUTURE WORK…………………………………………………………………24
REFERENCES…………………………………………………………………...25
LIST OF FIGURES
FIGURE NO FIGURE NAME PAGE NO
Figure 3.1.1 Named Entity Recognition 08
Figure 3.1.2 Relation Extraction 09
Figure 3.2.1 Visual Relationship Detection 11
Figure 3.2.2 Face Recognition 12
Figure 3.3.1 Acoustic event detection 14
Figure 3.3.2 Automatic Speech Recognition 15
Figure 3.4.1 Text Recognition 17
Figure 3.4.2 Automatic Video Summarization 19