ABSTRACT
The field text summarization has been evolving along with the facilitations in the field
of Natural Language Processing (NLP) and Computer Science. This project investigates the field
of text summarization, giving an overview of different approaches towards automatically
generating summaries. Further, a clarification towards the requirements and scope towards
implementing a summarizer for video transcripts is given.
The length of a transcript can be shortened by applying extractive summarization with Bert
model and then the T5 model is used. Here, currently Sumy with Latent Semantic Analysis
(LSA) summarizer is used. The quality of the summaries generated by the summarizer were
evaluated by comparing them to human created summaries, the evaluation showed that the
summarizer performs consistently and produces readable summaries.
CONTENTS
1. INTRODUCTION
1.1 PURPOSE 1
1.2 SCOPE 2
2. FUNCTIONAL REQUIREMENTS
2.1 DEFINATION 3
2.2 USE CASES 4
2.3 HARDWARE REQUIREMENTS 7
2.4 SOFTWARE REQUIREMENTS 7
3. USER REQUIREMENTS
3.1 USER REQUIREMENTS 8
3.2 REQUIREMENTS OF UPDATED VERSION 8
3.3 FEATURES OF THE NEW SYSTEM 8
3.4 DATA FLOW DIAGRAM 9
3.5 UML MODELLING 10
3.6 CONTEXT DIAGRAM 10
4. E-R DIAGRAM 12
5. SCREEN SHOT
5.1 GOOGLE CHROME BROWSER 13
5.1.1. BROWSER HOME PAGE 13
5.1.2. CHROME EXTENSIONS ICON 14
5.1.3. CHROME EXTENSION STORE 15
5.1.4. SEARCHING THE CHROME EXTENSION YT 16
SUMMARIZER
5.1.5. YT SUMMARIZER CHROME EXTENSION ADD TO
CHROME BLUE BUTTON 17
5.1.6. CHROME EXTENSION FOR YOUTUBE TRANSCRIPT
SUMMARIZER 18
5.2 RESULT 19
6. TESTING
6.1 SYSTEM TESTING 20
7. CONCLUSION
7.1 LIMITATION 22
7.1.1. SUPPORTS ONLY 1024 WORDS FOR SUMMARIZATION
7.1.2. ACCURACY OF SUMMARY
7.1.3. AUDIO ACCURACY AND QUALITY
7.1.4. SUBTITLE ELIGIBLITY IN VIDEOS
7.2 CONCLUSION 23
8. REFERENCES 24
LIST OF FIGURES
Figure No. Figure Name Page No.
Use Cases
Figure 2.1 Accessing Chrome 3
Extension
Figure 2.2 User login or signup 4
Figure 2.3 Admin has provided two 6
formats of summarization
Data Flow Diagrams
Figure 3.1 DFD Level 0 9
UML Modelling
Figure 3.2 UML Diagram 10
Figure 3.3 Context Diagram 11
Figure 4.1 E-R Diagram 12
CHAPTER -1
INTRODUCTION
1.1.PURPOSE
This Software Requirements Specification provides a complete description of all the
functions and specifications of the Chrome Extension YouTube Transcript Summarizer.
In this project, I have created a Chrome Extension which will make a request to a backend
REST API where it will perform NLP and respond with a summarized version of a YouTube
Transcript.
1.2.SCOPE
Scope of this project is :Enormous number of video recordings are being created and
shared on the Internet throughout the day. It has become really difficult to spend time watching
such videos which may have a longer duration than expected and sometimes our efforts may
become futile if we couldn't find relevant information out of it. Summarizing transcripts of
such videos automatically allows us to quickly lookout for the important patterns in the video
and helps us to save time and effort to go through the whole content of the video.
The remainder of this document is 8 chapters, the first providing introduction of the
project. It lists all the functions performed by the system. The second chapter consists of
software requirements specification and all the dependencies. The third chapter provides
details about system analysis and design. The fourth chapter gives backend programming and
data dictionary information. The fifth chapter is of Chrome Extension development. The sixth
chapter gives testing for the project and browsing experience without distracting from it and
proper working of the project. The seventh chapter tells about the conclusion and future
enhancements of the project. The final chapter concerns with the bibliography.
This document is meant for describing all the features and procedures that were followed
while developing the Extension.
This document specially mentions the details of the project how it was developed, the
primary requirement, as well as various features and functionalities of the project and the
procedures followed in achieving these objectives.
1
YouTube Transcript Summarizer is a Chrome Extension which will help you get the short
summary of contents of a video so that one can save time and get the meaningful value of that
video they were seeking for in a faster way possible.
For example, a student is watching a video to understand some topic for his/her study, my motto
is to help the student by transcribing the audio and generate subtitle of that video and
summarize that content to make the student understand the topic faster and simplest and easy
way. Benefit is that this Extension can summarize the content so that the student might get help
in learning and making notes.
This Chrome Extension will transcribe audio from a video and generate the short
summary of the content and this is a very useful extension for people who are looking for a
specific video and most useful in video conferences to make summarized notes.
2
CHAPTER – 2
FUNCTIONAL REQUIREMENTS
2.1 FUNCTIONAL REQUIREMENTS DEFINITIONS
Functional Requirements are those that refer to the functionality of the system, i.e.,
what services it will provide to the user. Non-functional (supplementary) requirements pertain
to other information needed to produce the correct system and are detailed separately.
2.2 USE CASES
This system will be used in different required ways of a user, as not only you tube
video summarization but also videos from websites, video conferences from different region
with diverse language based summarization to understand the content on their own language.
➢ User can do the following functions in the Chrome Extension:
o Summarization in text
o Summarization in audio
o Make notes
o Understand the main highlights
o Create questionaries for exam purpose
o Quick Revision
2.2.1 Use Case: Access Chrome Extension
Fig. 2.1 Accessing Chrome Extension
Brief Description:
3
User uses the Chrome Extension of YouTube Transcript Summarizer access the
Extension.
Initial step-by-step description:
For this use case to be initiated, the user can use the Chrome Extension YouTube
Transcript Summarizer by:
1. The user connects to the system using a web browser compulsory Chrome browser.
2. The user selects the Extension icon on Chrome browser at top-right corner which looks
like a little greyish coloured puzzle piece little icon.
3. The system passes the user to the Chrome Extension page where their will all the
Extensions are available.
4. The user needs to find the search button on Chrome extension page and type the name of
extension that is “YTSUMMARIZER”.
5. Then the user clicks on the search appeared Extension and clicks on Add to chrome blue
coloured button and the extension will be added to the users chrome browser.
6. Then user should pin the extension with the pinning icon present near extensions name.
7. Then whenever they want to access it they can without going further inside to find that
extension after pinning it.
2.2.2. Use Case: User Login or Signup
Fig. 2.2 User Login or Signup
4
Fig. 2.2 User Login or Signup
Brief Description:
The user don’t need to either log in or sign up if they already have chrome browser
account to access a Chrome Extension.
Initial step-by-step description:
For this use case to be initiated the user must on the chrome browser.
1. The system passes the user to the Chrome Extension page where their will all the
Extensions are available.
2. The user needs to find the search button on Chrome extension page and type the
name of extension that is “YTSUMMARIZER”.
3. Then the user clicks on the search appeared Extension and clicks on Add to
chrome blue coloured button and the extension will be added to the users chrome
browser.
4. Then user should pin the extension with the pinning icon present near extensions
name.
5. Then whenever they want to access it they can without going further inside to
find that extension after pinning it.
2.3 Use Case:2 formats of summarization
5
Fig. 2.3 Admin has provided two forms in Summarization
Text and Audio.
Brief Description:
Admin has provided to forms in Summarization Text and Audio for user.
Initial step-by-step description:
For this use case to be initiated, the user must select option from 2 forms of
summarization available
1. The user should select the button and click on it, whichever form they want
summarization in.
2. The 2 buttons are text summarization and audio summarization.
3. The user when select text summarization the summary will start processing if the video
is subtitle eligible and then it’ll show the summary on the same page with a popup like
box and you can copy paste it.
4. The user when selects audio summarization button, a summarized audio will get
processed and it will play the audio.
6
5. Then they can make notes or record the audio on some recording device by playing it
loud.
2.3 HARDWARE REQUIREMENTS
Hardware Requirements:
Processor: All
Chrome Browser (92,93): with the updated versions if not the most
latest
RAM: All
Hard Disk: All
Disk space: All
2.4 SOFTWARE REQUIREMENTS
Software Requirements:
Python 3.10 version the most Latest Compiler
JS, HTML, JSON compiler installed in vs code.
.NET Framework 4.8 or nearest below to 3.5
Natural language processing
Transformers
YouTube Transcript API
7
CHAPTER – 3
USER REQUIREMENTS
3.1 USER REQUIREMENTS
The User requirements to recommend this extension to others and tell the admin by
emailing the working and feedback.
o Time can be saved in text summarizations
o Feedback
o Proper internet connection clear audio.
3.2 REQUIREMENTS OF UPDATED VERSION
▪ Speed
▪ Accuracy
▪ Larger videos eligible for summarization
▪ Summarization of no-subtitle eligible videos.
3.3 FEATURES OF THE NEW STSTEM
The new system has been designed as per the user requirements so as to fulfil almost
all them.
a. Speedy Processing
Summary can be generated very quickly as compared to the existing extension as it allows
the use of previously generated summaries to see again with data stored on cloud. It saves time
required to get the summarization of audio and text faster.
b. Accuracy
One of the most important draw backs of the current system is that audio is not at its best
accuracy and can’t be generated on longer size videos because of word limit. The new system
will generate the result as soon as the summarization is processed by user and will also store
it in the database for future usage.
c. High-quality Audio
The new system makes it easy to store and retrieve information as required and does not
involve storing information by the user-self its on Auto mode from cloud. It thus saves data
8
management problems faced in the current system as it has a Database Management System
of only one-time access.
d. Zero Cost and No Advertisements
Unique service provider this extension as it does not show any advertisements and provide no-
cost service.
3.5 DATA FLOW DIAGRAM
The DFD (also known as bubble chart) is a simple graphical formalism that can be
used to represent a system in terms of the input data into the system, various processes
carried on these data, and the output data generated by the system.
The main reason why the DFD technique is so popular is because the fact that the
DFD is a very simple formalism – it is simple to understand and use. A DFD model uses a
very limited number of primitive symbols to represent the functions performed by a system
and the data flow among the functions. Starting with a set of high-level functions that a
system performs, a DFD model hierarchy represents various sub-functions.
Fig. 3.1 DFD level 0
9
3.5 UML MODELLING
Fig. 3.2 UML Modelling
3.6 CONTEXT DIAGRAM
The context diagram is a top-level view of an information system that shows the
boundaries and scope. It describes the main objective of the system and the entities
involved.
10
Fig. 3.3 Context Diagram
11
CHAPTER – 4
E–R DIAGRAM
E-R DIAGRAM:
Fig. 4.1 E-R Diagram
12
5CHAPTER – 5
SCREEN SHOTS
5.1 GOOGLE CHORME BROWSER:
5.1.1 BROWSER HOME PAGE
13
5.1.2 Chrome Extensions Icon
14
5.1.3 Chrome Extension store
15
5.1.4. Searching the chrome extension YT Summarizer
16
5.1.5. YT Summarizer chrome extension add to chrome blue button
17
5.1.6. Chrome Extension for YouTube Transcript Summarizer
YT Summarizer
18
5.2 RESULT
19
CHAPTER – 6
TESTING
6.1 SYSTEM TESTING:
The objective of system testing is to ensure that all individual programs are working
as expected, that the programs link together to meet the requirements specified and to ensure
that the computer system and the associated clerical and other procedures work together.
The initial phase of system testing is the responsibility of the analyst who determines
what conditions are to be tested, generates test data, produced a schedule of expected results,
runs the tests and compares the computer produced results with the expected results with the
expected results.
The analyst may also be involved in procedures testing. When the analyst is satisfied
that the system is working properly, he hands it over to the users for testing. The importance
of system testing by the user must be stressed. Ultimately it is the user must verify the system
and give the go-ahead.
During testing, the system is used experimentally to ensure that the software does not
fail, i.e., that it will run according to its specifications and in the way users expect it to. Special
test data is input for processing (test plan) and the results are examined to locate unexpected
results.
A limited number of users may also be allowed to use the system so analysts can see
whether they try to use it in unexpected ways. It is preferably to find these surprises before
the organization implements the system and depends on it. In many organizations, testing is
performed by persons other than those who write the original programs. Using persons who
do not know how certain parts were designed or programmed ensures more complete and
unbiased testing and more reliable software.
The system is tested as a complete, integrated system. System testing first occurs in
the development environment but eventually is conducted in the production environment.
Functionality and performance testing are designed to catch bugs in the system, unexpected
results, or other ways in which the system does not meet the stated requirements.
20
The testers create detailed scenarios to test the strength and limits of the system, trying
to break it if possible. Editorial reviews not only correct typographical and grammatical errors,
but also improve the system’s overall usability by ensuring that on-screen language is clear
and helpful to users. Accessibility reviews ensure that the system is accessible to users with
disabilities.
System testing consists of the following five steps:
i. Program testing
ii. String testing
iii. System testing
iv. System documentation
v. User acceptance testing
21
CHAPTER – 7
CONCLUSION
7.1 LIMITATION
The new system has been designed to meet almost all of the user requirements but this
too has certain limitations some of which can be enhanced in the future enhancements or
updates.
7.1.1. Supports only 1024 words for summarization
The existing system modules and models are only able to generate summary of up to
1024 words and less sometimes because of less accuracy. The new system will be able to
overcome this in future with the updated models being used in backend.
7.1.2 Accuracy of summary
The existing system does not provide high rate of accuracy and its hard to be only
dependent on this model for future works because this might lead to make us miss some good
video content if summary is not all accurate enough.
7.1.3 Audio accuracy and quality
The system currently does not provide audio summary of larger videos because it’s
hard for low-level modules to generate huge amount of summary and process it, it does not
have the capability to process large transcription.
7.1.4 Subtitle eligibility in videos
The existing system has the eligibility criteria for a YouTube video to
generate summarization in whatever required format because it cannot transcribe
22
the improper language audio to transcribe summary of a video, this leads to one
single criteria to generate only subtitle in-built videos to generate summarization.
7.2 CONCLUSION:
The development of software includes so many people like user system
developer, user of system and the management, It is important to identify the system
requirements by properly collecting required data to interact with supplier and
customer of the system.
Proper design builds upon this foundation to give a blue print, which is actually
implemented by the developers.
On realizing the importance of systematic documentation all the processes are
implemented using a software engineering approach. Working in a live environment
enables one to appreciate the intricacies involved in the System Development Life
Cycle (SDLC).
I have gained a lot of practical knowledge from this project, which I think, shall
make me stand in a good state in the future.
23
CHAPTER – 8
REFERENCES
List of useful References:
• https://pypi.org/project/youtube-transcript-api/
• https://www.thepythoncode.com/article/text-summarization-using-
huggingface-transformers-python
• https://medium.com/better-programming/the-ultimate-guide-to-building-
a-chrome-extension-4c01834c63ec
• https://developer.chrome.com/docs/extensions/mv2/user_interface/
• https://www.thepythoncode.com/article/translate-text-in-python
24