0% found this document useful (0 votes)

5 views18 pages

Caption Generator

Uploaded by

triveneebadgujar28

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views18 pages

Caption Generator

Uploaded by

triveneebadgujar28

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

An NLP Mini Project Report on

“CAPTION GENERATOR”

Submitted in partial fulfillment of the requirements for the award of the Degree of

Bachelor of Engineering
In
Computer Engineering
By

PRANAV AMALE (01)

ANIKET DHAWALE (11)
MAYURESH DESAI (08)

Under the Guidance of

Prof. RUCHA PATWARDHAN

Department of Computer Engineering

Watumull Institute of Engineering and Technology
Ulhasnagar - 421003
UNIVERSITY OF MUMBAI
Academic Year 2024-2025

I
Approval Sheet

This Mini Project Report entitled "CAPTION GENERATOR" Submitted by "PRANAV

AMALE" (01), "ANIKET DHAWALE" (11) , "MAYURESH DESAI" (08) is approved for
the partial fulfilment of the requirement for the award of the degree of Bachelor of
Engineering in Computer Engineering from University of Mumbai.

Prof.
(Guide)

Prof.
(H.O.D)

Place: Ulhasnagar

Date:

II
CERTIFICATE

This is to certify that the mini project entitled "CAPTION GENERATOR”" submitted by
"PRANAV AMALE" (01), "ANIKET DHAWALE" (11), "MAYUESH DESAI" (08)
for the partial fulfilment of the requirement for award of a degree Bachelor of Engineering in
Computer Engineering, to the University of Mumbai, is a bonafide work carried out during
academic year 2024-2025.

Prof.
(Guide)

Examiners:

Place: Ulhasnagar

Date: / /24

III
Declaration

We declare that this written submission represents our ideas in our own words and where other's
ideas or words have been included, we have adequately cited and referenced the original
sources. We also declare that we have adhered to all principles of academic honesty and
integrity and have not misrepresented or fabricated or falsified any idea/data/fact/source in our
submission. We understand that any violation of the above will be cause for disciplinary action
by the Institute and can also evoke penal action from the sources which have thus not been
properly cited or from whom proper permission has not been taken when needed.

Signature

PRANAV AMALE(01)

ANIKET DHAWALE (11)

MAYURESH DESAI (08)

Place: Ulhasnagar

Date: / /24

IV
Abstract

This project presents a basic implementation of a subtitle generator using Python's

SpeechRecognition library. The subtitle generator automatically transcribes speech
from an audio source into text using Google's Speech Recognition API. The project
aims to provide a simple solution for creating subtitles in various multimedia
applications, such as video content production, accessibility for the hearing impaired,
and automated transcription services. The project demonstrates how to convert speech
into text, providing an essential component for subtitle generation in media
workflows.

V
Content
Sr. No. Topic Page No

1. Introduction 1-4
1.1 Overview 1
1.2 Introduction 1
1.3 Objective 2
1.4 Scope 3
1.5 Purpose 4

2. Problem Definition 5–6

2.1 Problem Statement 5
2.2 Technology 5

3. Hardware and Software Specification

3.1 Hardware Specification 7
3.2 Software Specification 7

4. Design 8 - 11
4.1 Explanation 8
4.2 Flow chart / ER diagram 9
4.3 Screen Shots 10

5. Result 12 - 14
5.1 Screen Shots of test result or output 12

6. Future Scope and enhancements 15

7. Conclusion 16

8. Bibliography 17

Acknowledgment 18

VI
Chapter 1 : Introduction
1.1 : Overview

Subtitles, also known as captions, are essential for improving the accessibility and
comprehension of audio-visual content. They help viewers follow along by displaying
the spoken dialogue in written form. This project focuses on implementing a basic yet
effective subtitle generator using automatic speech recognition (ASR) technology.
The core functionality involves taking an audio input, either from a file or
microphone, and converting it into text, which can be displayed as subtitles.

The project leverages Python's SpeechRecognition library, which provides a

straightforward interface for interacting with speech-to-text services. The
implementation uses Google's Speech-to-Text API to process audio and generate
subtitles.

1.2 : Objective

The primary objective of this project is to build a functional, minimalistic subtitle

generator that processes audio input and converts it into text. The generated text can
then be used as subtitles for multimedia content or other applications where speech-
to-text conversion is needed. The project aims to provide a clear and understandable
method of generating subtitles without requiring extensive hardware or software
resources.

1.3 : Scope

The scope of this project is limited to basic subtitle generation through speech
recognition, using either a microphone or an audio file (WAV format). The subtitles
are displayed as plain text in the console, but the project can be extended to
incorporate time-stamped subtitles for video playback. It is designed to handle speech
in a quiet environment, and its performance may degrade in noisy surroundings or
with poor-quality audio files. The project is scalable and could be expanded with
additional features like real-time subtitles and multilingual support.

1.4 : Purpose

The purpose of this project is to demonstrate a basic method for generating subtitles
using speech-to-text technology. The generated subtitles can enhance accessibility in
videos for hearing-impaired individuals, provide automated transcription for content
7
creators, and assist in creating closed captions for educational, entertainment, and
business videos. Additionally, this project serves as a foundation for more advanced
subtitle generation systems, incorporating features like speaker identification and real-
time processing.

8
Chapter 2 : Problem Definition
2.1 : Problem Statement

Manual transcription and subtitle creation can be a time-consuming and tedious

process, especially for long audio or video files. There is a growing need for
automated solutions that can quickly and accurately generate subtitles to enhance
accessibility and improve content consumption experiences. This project addresses
this issue by creating an automated subtitle generator that converts spoken words into
readable text using a speech recognition API.

2.2 : Technology Used

The project uses the following technologies:

 Python: The programming language chosen for its simplicity and ease of
integration with various libraries.
 SpeechRecognition Library: This Python library provides easy access to
speech recognition engines like Google’s Speech-to-Text API.
 Google Speech-to-Text API: A cloud-based service that converts audio into
text using machine learning models.
 Pydub Library (optional): Used for audio file format conversions, if necessary
(e.g., from MP3 to WAV).

9
Chapter 3 : Hardware and Software Specification
3.1 : Hardware Specification

 Processor: Intel Core i3 or higher

 RAM: 4GB (minimum)
 Storage: 500MB of free disk space for storing audio files
 Microphone: External or built-in microphone (if using live audio input)

3.2 : Software Specification

 Operating System: Windows 7 or higher / macOS / Linux

 Python Version: Python 3.x
 Required Libraries:

o speech_recognition: Handles the audio input and communicates with

Google's API.
o pydub (optional): For converting non-WAV audio formats to WAV, as
required by SpeechRecognition.
o Google API: Uses Google’s cloud-based API for speech recognition.

10
Chapter 4 : Design
4.1 : Explanation

The project revolves around capturing audio, processing it using the

SpeechRecognition library, and then converting the speech into text using Google's
Speech-to-Text API. The audio can be input either from a pre-recorded file (in WAV
format) or live through a microphone. The recognized text is printed in the terminal
and can be displayed as subtitles for various applications.

4.2 : Steps in the Program:

1. Audio Input: The system captures the audio, either from a microphone or an
audio file.
2. Audio Processing: The audio data is processed using the Recognizer class
from the SpeechRecognition library.
3. Speech-to-Text Conversion: The processed audio is sent to Google’s Speech
Recognition API, which converts it into text.
4. Error Handling: The system handles exceptions in case the audio cannot be
processed or understood.
5. Display Subtitles: The generated text is displayed as the subtitles.

11
4.3 : Flowchart

12
Chapter 5 : Result
Screenshots of Test Results

13
14
Chapter 6 : Future Scope and Enhancements
The current implementation is a basic proof-of-concept that can be significantly
enhanced in the following ways:

 Real-time Subtitle Generation: Currently, the system waits for the entire
audio input to be processed before generating subtitles. Future enhancements
could enable real-time subtitle generation, where subtitles are generated
dynamically as the audio is being played or spoken.
 Timestamped Subtitles: In this project, the generated subtitles are plain text.
Future versions could incorporate timestamps to create subtitles compatible
with media players.
 Multiple Language Support: Expand the system to recognize and transcribe
speech in multiple languages, enabling global use.
 Noise Reduction: Integrate advanced noise filtering algorithms to improve
recognition accuracy in noisy environments.
 Speaker Identification: Add functionality to detect and distinguish between
multiple speakers, which is especially useful for creating more organized
subtitles.

15
Chapter 7 : Conclusion
This project successfully demonstrates the implementation of a basic subtitle
generator using Python's SpeechRecognition library. The tool provides a simple
solution for converting audio into text, which can be used for various purposes such
as video subtitles, transcription, or accessibility services. The project lays a solid
foundation for further enhancements, including real-time processing, timestamped
subtitles, and support for multiple languages.

This mini-project effectively highlights the power of speech recognition technology

and shows how easily accessible tools like Python can be used to create practical
solutions in the realm of multimedia and accessibility.

16
Chapter 8 : Bibliography
SpeechRecognition Library Documentation:
https://pypi.org/project/SpeechRecognition/

Google Cloud Speech-to-Text API:

https://cloud.google.com/speech-to-text

Python Official Documentation:

https://docs.python.org/3/

Tkinter Documentation (Python's Standard GUI Library):

https://docs.python.org/3/library/tk.html

Pydub Library for Audio File Manipulation:

https://pypi.org/project/pydub/

Stack Overflow - Handling Errors in Speech Recognition:

https://stackoverflow.com/questions/41682546/how-to-handle-exceptions-in-
speechrecognition-in-python

Real Python - Introduction to Python GUI Programming with Tkinter:

https://realpython.com/python-gui-tkinter/

Google Speech-to-Text API Quotas and Limits:

https://cloud.google.com/speech-to-text/quotas

Wikipedia - Speech Recognition:

https://en.wikipedia.org/wiki/Speech_recognition

Medium - Build a Speech Recognition App with Python:

https://medium.com/analytics-vidhya/build-a-speech-recognition-application-using-
python-4bff531bb02e

17
Acknowledgement

We have great pleasure in presenting the mini project report on “CAPTION

GENERATOR”.. We take this opportunity to express our sincere thanks towards
our guide Prof. RUCHA PATWARDHAN, Department of Computer
Engineering, Watumull Institute, for providing the technical guidelines and
suggestions regarding line of work. We would like to express our gratitude
towards his constant encouragement, support and guidance throughout the
development of project..

We thank Prof. NILESH K. MEHTA Head of the Department, Computer

Engineering, Watumull Institute for his encouragement during progress meeting
and providing guidelines to write this report.

We thank Prof. AVINASH V. GONDAL, the Principal of Watumull Institute, for

his Valuable suggestions and constant support.

We also thank the entire staff of Watumull Institute of Electronics Engineering

and Computer Technology for their invaluable help rendered during the course of
this work. We wish to express our deep gratitude towards all our colleagues of
Watumull Institute of Electronics Engineering and Computer Technology for their
encouragement.

PRANAV AMALE (01)

ANIKET DHAWALE (11)

MAYURESH DESAI (08)

Just For The Summer by Abby Jimenez
No ratings yet
Just For The Summer by Abby Jimenez
345 pages
Sped 277 Udl Lesson Plan Templated Inferences
No ratings yet
Sped 277 Udl Lesson Plan Templated Inferences
2 pages
Report
No ratings yet
Report
29 pages
Mini Project Report 3.00000000
No ratings yet
Mini Project Report 3.00000000
21 pages
Major Project
No ratings yet
Major Project
22 pages
Speech & Text Recognition Report
No ratings yet
Speech & Text Recognition Report
74 pages
Advanced Image To Speech Conversion
No ratings yet
Advanced Image To Speech Conversion
46 pages
Mini Proj Rep
No ratings yet
Mini Proj Rep
20 pages
Proposal PhamThaiNguyen 22560053
No ratings yet
Proposal PhamThaiNguyen 22560053
11 pages
Subtitle Generation Using Sphinx
0% (1)
Subtitle Generation Using Sphinx
59 pages
Visual Assist
No ratings yet
Visual Assist
53 pages
Project Report
No ratings yet
Project Report
58 pages
Speech Recognition Python Project
No ratings yet
Speech Recognition Python Project
49 pages
Text-to-Speech Project Report
No ratings yet
Text-to-Speech Project Report
26 pages
Draft
No ratings yet
Draft
80 pages
Project Report Rtu
No ratings yet
Project Report Rtu
17 pages
Speech Recognition Report
No ratings yet
Speech Recognition Report
46 pages
Text To Speech Speech To Text Using Translations (Mini Project)
No ratings yet
Text To Speech Speech To Text Using Translations (Mini Project)
46 pages
Untitled 1 1 1
No ratings yet
Untitled 1 1 1
46 pages
ImageCaptioning (BLIP) Final
No ratings yet
ImageCaptioning (BLIP) Final
90 pages
Expert System Voice Assistant Project
No ratings yet
Expert System Voice Assistant Project
52 pages
Phase-1 Report
No ratings yet
Phase-1 Report
29 pages
Internship Report (Sanjay Final)
No ratings yet
Internship Report (Sanjay Final)
45 pages
PRJ Final
No ratings yet
PRJ Final
33 pages
Project 1 - Final Report 8th Sem (VERIFIED) 2025
No ratings yet
Project 1 - Final Report 8th Sem (VERIFIED) 2025
55 pages
Voice Based System Assistant Using NLP and Deep Learning-1
No ratings yet
Voice Based System Assistant Using NLP and Deep Learning-1
82 pages
Project Report
No ratings yet
Project Report
35 pages
Main Thesis
No ratings yet
Main Thesis
76 pages
Project Report
No ratings yet
Project Report
17 pages
Speech Recognition
No ratings yet
Speech Recognition
66 pages
Document of Python Virtual Assistant
No ratings yet
Document of Python Virtual Assistant
18 pages
Augustin Document
No ratings yet
Augustin Document
43 pages
ImageCaptioning (BLIP) Final
No ratings yet
ImageCaptioning (BLIP) Final
90 pages
Team 22 Final
No ratings yet
Team 22 Final
54 pages
Automatic Subtitle Generation For Sound in Videos: Project Topic
No ratings yet
Automatic Subtitle Generation For Sound in Videos: Project Topic
1 page
7sem Projectreport
No ratings yet
7sem Projectreport
33 pages
Automatic Subtitle System for Videos
0% (1)
Automatic Subtitle System for Videos
25 pages
Anurag Synop
No ratings yet
Anurag Synop
9 pages
Project
No ratings yet
Project
76 pages
Untitled 1 1 1
No ratings yet
Untitled 1 1 1
43 pages
Mini Project Report
No ratings yet
Mini Project Report
32 pages
24 25 Final Report Stage I
No ratings yet
24 25 Final Report Stage I
50 pages
Research On Speech Recognition Technique While Building Speech Recognition Bot
No ratings yet
Research On Speech Recognition Technique While Building Speech Recognition Bot
13 pages
SDGDSGDSG
No ratings yet
SDGDSGDSG
31 pages
A12 Mini Project Documentation 1
No ratings yet
A12 Mini Project Documentation 1
56 pages
SYnopsis
No ratings yet
SYnopsis
5 pages
Sample Masters Project Proposal Form 2023-24 (1) 1
No ratings yet
Sample Masters Project Proposal Form 2023-24 (1) 1
2 pages
A Mini Project Report On: Submitted in Partial Fulfillment of The Requirements For The Award of
No ratings yet
A Mini Project Report On: Submitted in Partial Fulfillment of The Requirements For The Award of
62 pages
Reportmini
No ratings yet
Reportmini
30 pages
Institute of Professional Studies and Research: Project
No ratings yet
Institute of Professional Studies and Research: Project
29 pages
Speech-to-Text Project for Students
No ratings yet
Speech-to-Text Project for Students
18 pages
New PDF
No ratings yet
New PDF
48 pages
AI Voice Assistant Project
No ratings yet
AI Voice Assistant Project
44 pages
1.modern Text Tool
No ratings yet
1.modern Text Tool
8 pages
AI Image Captioning System Development
No ratings yet
AI Image Captioning System Development
25 pages
DL Proj Rep
No ratings yet
DL Proj Rep
11 pages
Real Time Voice Translator
No ratings yet
Real Time Voice Translator
28 pages
Mini Project Report
No ratings yet
Mini Project Report
20 pages
Presentation Format
No ratings yet
Presentation Format
17 pages
QB
No ratings yet
QB
5 pages
1
No ratings yet
1
10 pages
QB
No ratings yet
QB
4 pages
IoT P4
No ratings yet
IoT P4
6 pages
IoT P2
No ratings yet
IoT P2
4 pages
Program
No ratings yet
Program
2 pages
BDA Exp 1
No ratings yet
BDA Exp 1
9 pages
Figure 1-Bus Topology
No ratings yet
Figure 1-Bus Topology
2 pages
C&ss - Module1-1
No ratings yet
C&ss - Module1-1
91 pages
Program
No ratings yet
Program
4 pages
OpenAI Generative Pre-Trained Transformer 3 (GPT-3) For Developers
No ratings yet
OpenAI Generative Pre-Trained Transformer 3 (GPT-3) For Developers
24 pages
Aim Theory
No ratings yet
Aim Theory
2 pages
Sample
No ratings yet
Sample
8 pages
Bajau of Semporna Sabah
No ratings yet
Bajau of Semporna Sabah
5 pages
Chap 2
No ratings yet
Chap 2
4 pages
Final IInd Year Syllabus of BAMS
67% (3)
Final IInd Year Syllabus of BAMS
22 pages
PSNC Intervention Motion in MVP DC FERC Case
No ratings yet
PSNC Intervention Motion in MVP DC FERC Case
16 pages
LTE: High-Speed Mobile Networks
No ratings yet
LTE: High-Speed Mobile Networks
15 pages
JPN Pharma Brochure
No ratings yet
JPN Pharma Brochure
2 pages
短语、分句、句子
No ratings yet
短语、分句、句子
7 pages
RWS Vol. 29 No. 2 113 146 Presto 2020 Revisiting Intersectional Identities - Voices of Poor Bakla Youth in Rural Philippines
No ratings yet
RWS Vol. 29 No. 2 113 146 Presto 2020 Revisiting Intersectional Identities - Voices of Poor Bakla Youth in Rural Philippines
34 pages
Hyster 330Y
100% (1)
Hyster 330Y
20 pages
Final Exam Denis Bonilla
100% (1)
Final Exam Denis Bonilla
7 pages
Handbook of Ethics in Quantitative Methodology 1st Edition A. T. Panter All Chapters Instant Download
100% (4)
Handbook of Ethics in Quantitative Methodology 1st Edition A. T. Panter All Chapters Instant Download
84 pages
Excel Practical Assignments
No ratings yet
Excel Practical Assignments
88 pages
RPG and Story-Based Game in Game Development
No ratings yet
RPG and Story-Based Game in Game Development
9 pages
SoftTest03022023 0937
No ratings yet
SoftTest03022023 0937
5 pages
Internship Progress Report Vivek
No ratings yet
Internship Progress Report Vivek
10 pages
Iot Physical Devices and Endpoints: Bahga & Madisetti, © 2015
No ratings yet
Iot Physical Devices and Endpoints: Bahga & Madisetti, © 2015
14 pages
Classroom Visits and Observing The Teaching Learning Situation
No ratings yet
Classroom Visits and Observing The Teaching Learning Situation
36 pages
12th Activity 1
No ratings yet
12th Activity 1
6 pages
MATH1 Q1 W1 MATATAG DLL New
No ratings yet
MATH1 Q1 W1 MATATAG DLL New
6 pages
Thinking Through Drawing
No ratings yet
Thinking Through Drawing
35 pages
Econometric Multicollinearity Analysis
No ratings yet
Econometric Multicollinearity Analysis
4 pages
Modern World History - Chapter 16
No ratings yet
Modern World History - Chapter 16
52 pages
Exam Center Data
No ratings yet
Exam Center Data
2 pages
Yaesu FT-8800 Summary Sheet
No ratings yet
Yaesu FT-8800 Summary Sheet
1 page
Career Counseling Competencies Guide
100% (1)
Career Counseling Competencies Guide
13 pages
Employment Law (Palgrave Law Masters) (PDFDrive)
No ratings yet
Employment Law (Palgrave Law Masters) (PDFDrive)
521 pages
Dental Manpower
No ratings yet
Dental Manpower
24 pages
Berecki, S., A Settlement Belonging To The Coţofeni Culture From Ogra (Mureş County), Marisia, XXVIII, 7-25.
No ratings yet
Berecki, S., A Settlement Belonging To The Coţofeni Culture From Ogra (Mureş County), Marisia, XXVIII, 7-25.
19 pages

Caption Generator

Uploaded by

Caption Generator

Uploaded by

An NLP Mini Project Report on

PRANAV AMALE (01)

Under the Guidance of

Prof. RUCHA PATWARDHAN

Department of Computer Engineering

This Mini Project Report entitled "CAPTION GENERATOR" Submitted by "PRANAV

ANIKET DHAWALE (11)

MAYURESH DESAI (08)

This project presents a basic implementation of a subtitle generator using Python's

2. Problem Definition 5–6

3. Hardware and Software Specification

6. Future Scope and enhancements 15

The project leverages Python's SpeechRecognition library, which provides a

The primary objective of this project is to build a functional, minimalistic subtitle

Manual transcription and subtitle creation can be a time-consuming and tedious

2.2 : Technology Used

The project uses the following technologies:

 Processor: Intel Core i3 or higher

3.2 : Software Specification

 Operating System: Windows 7 or higher / macOS / Linux

o speech_recognition: Handles the audio input and communicates with

The project revolves around capturing audio, processing it using the

4.2 : Steps in the Program:

This mini-project effectively highlights the power of speech recognition technology

Google Cloud Speech-to-Text API:

Python Official Documentation:

Tkinter Documentation (Python's Standard GUI Library):

Pydub Library for Audio File Manipulation:

Stack Overflow - Handling Errors in Speech Recognition:

Real Python - Introduction to Python GUI Programming with Tkinter:

Google Speech-to-Text API Quotas and Limits:

Wikipedia - Speech Recognition:

Medium - Build a Speech Recognition App with Python:

We have great pleasure in presenting the mini project report on “CAPTION

We thank Prof. NILESH K. MEHTA Head of the Department, Computer

We thank Prof. AVINASH V. GONDAL, the Principal of Watumull Institute, for

We also thank the entire staff of Watumull Institute of Electronics Engineering

PRANAV AMALE (01)

ANIKET DHAWALE (11)

MAYURESH DESAI (08)

You might also like