Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
5 views18 pages

Caption Generator

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views18 pages

Caption Generator

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

An NLP Mini Project Report on

“CAPTION GENERATOR”

Submitted in partial fulfillment of the requirements for the award of the Degree of

Bachelor of Engineering
In
Computer Engineering
By

PRANAV AMALE (01)


ANIKET DHAWALE (11)
MAYURESH DESAI (08)

Under the Guidance of

Prof. RUCHA PATWARDHAN

Department of Computer Engineering


Watumull Institute of Engineering and Technology
Ulhasnagar - 421003
UNIVERSITY OF MUMBAI
Academic Year 2024-2025

I
Approval Sheet

This Mini Project Report entitled "CAPTION GENERATOR" Submitted by "PRANAV


AMALE" (01), "ANIKET DHAWALE" (11) , "MAYURESH DESAI" (08) is approved for
the partial fulfilment of the requirement for the award of the degree of Bachelor of
Engineering in Computer Engineering from University of Mumbai.

Prof.
(Guide)

Prof.
(H.O.D)

Place: Ulhasnagar

Date:

II
CERTIFICATE

This is to certify that the mini project entitled "CAPTION GENERATOR”" submitted by
"PRANAV AMALE" (01), "ANIKET DHAWALE" (11), "MAYUESH DESAI" (08)
for the partial fulfilment of the requirement for award of a degree Bachelor of Engineering in
Computer Engineering, to the University of Mumbai, is a bonafide work carried out during
academic year 2024-2025.

Prof.
(Guide)

Examiners:

Place: Ulhasnagar

Date: / /24

III
Declaration

We declare that this written submission represents our ideas in our own words and where other's
ideas or words have been included, we have adequately cited and referenced the original
sources. We also declare that we have adhered to all principles of academic honesty and
integrity and have not misrepresented or fabricated or falsified any idea/data/fact/source in our
submission. We understand that any violation of the above will be cause for disciplinary action
by the Institute and can also evoke penal action from the sources which have thus not been
properly cited or from whom proper permission has not been taken when needed.

Signature

PRANAV AMALE(01)

ANIKET DHAWALE (11)

MAYURESH DESAI (08)

Place: Ulhasnagar

Date: / /24

IV
Abstract

This project presents a basic implementation of a subtitle generator using Python's


SpeechRecognition library. The subtitle generator automatically transcribes speech
from an audio source into text using Google's Speech Recognition API. The project
aims to provide a simple solution for creating subtitles in various multimedia
applications, such as video content production, accessibility for the hearing impaired,
and automated transcription services. The project demonstrates how to convert speech
into text, providing an essential component for subtitle generation in media
workflows.

V
Content
Sr. No. Topic Page No

1. Introduction 1-4
1.1 Overview 1
1.2 Introduction 1
1.3 Objective 2
1.4 Scope 3
1.5 Purpose 4

2. Problem Definition 5–6


2.1 Problem Statement 5
2.2 Technology 5

3. Hardware and Software Specification


3.1 Hardware Specification 7
3.2 Software Specification 7

4. Design 8 - 11
4.1 Explanation 8
4.2 Flow chart / ER diagram 9
4.3 Screen Shots 10

5. Result 12 - 14
5.1 Screen Shots of test result or output 12

6. Future Scope and enhancements 15

7. Conclusion 16

8. Bibliography 17

Acknowledgment 18

VI
Chapter 1 : Introduction
1.1 : Overview

Subtitles, also known as captions, are essential for improving the accessibility and
comprehension of audio-visual content. They help viewers follow along by displaying
the spoken dialogue in written form. This project focuses on implementing a basic yet
effective subtitle generator using automatic speech recognition (ASR) technology.
The core functionality involves taking an audio input, either from a file or
microphone, and converting it into text, which can be displayed as subtitles.

The project leverages Python's SpeechRecognition library, which provides a


straightforward interface for interacting with speech-to-text services. The
implementation uses Google's Speech-to-Text API to process audio and generate
subtitles.

1.2 : Objective

The primary objective of this project is to build a functional, minimalistic subtitle


generator that processes audio input and converts it into text. The generated text can
then be used as subtitles for multimedia content or other applications where speech-
to-text conversion is needed. The project aims to provide a clear and understandable
method of generating subtitles without requiring extensive hardware or software
resources.

1.3 : Scope

The scope of this project is limited to basic subtitle generation through speech
recognition, using either a microphone or an audio file (WAV format). The subtitles
are displayed as plain text in the console, but the project can be extended to
incorporate time-stamped subtitles for video playback. It is designed to handle speech
in a quiet environment, and its performance may degrade in noisy surroundings or
with poor-quality audio files. The project is scalable and could be expanded with
additional features like real-time subtitles and multilingual support.

1.4 : Purpose

The purpose of this project is to demonstrate a basic method for generating subtitles
using speech-to-text technology. The generated subtitles can enhance accessibility in
videos for hearing-impaired individuals, provide automated transcription for content
7
creators, and assist in creating closed captions for educational, entertainment, and
business videos. Additionally, this project serves as a foundation for more advanced
subtitle generation systems, incorporating features like speaker identification and real-
time processing.

8
Chapter 2 : Problem Definition
2.1 : Problem Statement

Manual transcription and subtitle creation can be a time-consuming and tedious


process, especially for long audio or video files. There is a growing need for
automated solutions that can quickly and accurately generate subtitles to enhance
accessibility and improve content consumption experiences. This project addresses
this issue by creating an automated subtitle generator that converts spoken words into
readable text using a speech recognition API.

2.2 : Technology Used

The project uses the following technologies:

 Python: The programming language chosen for its simplicity and ease of
integration with various libraries.
 SpeechRecognition Library: This Python library provides easy access to
speech recognition engines like Google’s Speech-to-Text API.
 Google Speech-to-Text API: A cloud-based service that converts audio into
text using machine learning models.
 Pydub Library (optional): Used for audio file format conversions, if necessary
(e.g., from MP3 to WAV).

9
Chapter 3 : Hardware and Software Specification
3.1 : Hardware Specification

 Processor: Intel Core i3 or higher


 RAM: 4GB (minimum)
 Storage: 500MB of free disk space for storing audio files
 Microphone: External or built-in microphone (if using live audio input)

3.2 : Software Specification

 Operating System: Windows 7 or higher / macOS / Linux


 Python Version: Python 3.x
 Required Libraries:

o speech_recognition: Handles the audio input and communicates with


Google's API.
o pydub (optional): For converting non-WAV audio formats to WAV, as
required by SpeechRecognition.
o Google API: Uses Google’s cloud-based API for speech recognition.

10
Chapter 4 : Design
4.1 : Explanation

The project revolves around capturing audio, processing it using the


SpeechRecognition library, and then converting the speech into text using Google's
Speech-to-Text API. The audio can be input either from a pre-recorded file (in WAV
format) or live through a microphone. The recognized text is printed in the terminal
and can be displayed as subtitles for various applications.

4.2 : Steps in the Program:

1. Audio Input: The system captures the audio, either from a microphone or an
audio file.
2. Audio Processing: The audio data is processed using the Recognizer class
from the SpeechRecognition library.
3. Speech-to-Text Conversion: The processed audio is sent to Google’s Speech
Recognition API, which converts it into text.
4. Error Handling: The system handles exceptions in case the audio cannot be
processed or understood.
5. Display Subtitles: The generated text is displayed as the subtitles.

11
4.3 : Flowchart

12
Chapter 5 : Result
Screenshots of Test Results

13
14
Chapter 6 : Future Scope and Enhancements
The current implementation is a basic proof-of-concept that can be significantly
enhanced in the following ways:

 Real-time Subtitle Generation: Currently, the system waits for the entire
audio input to be processed before generating subtitles. Future enhancements
could enable real-time subtitle generation, where subtitles are generated
dynamically as the audio is being played or spoken.
 Timestamped Subtitles: In this project, the generated subtitles are plain text.
Future versions could incorporate timestamps to create subtitles compatible
with media players.
 Multiple Language Support: Expand the system to recognize and transcribe
speech in multiple languages, enabling global use.
 Noise Reduction: Integrate advanced noise filtering algorithms to improve
recognition accuracy in noisy environments.
 Speaker Identification: Add functionality to detect and distinguish between
multiple speakers, which is especially useful for creating more organized
subtitles.

15
Chapter 7 : Conclusion
This project successfully demonstrates the implementation of a basic subtitle
generator using Python's SpeechRecognition library. The tool provides a simple
solution for converting audio into text, which can be used for various purposes such
as video subtitles, transcription, or accessibility services. The project lays a solid
foundation for further enhancements, including real-time processing, timestamped
subtitles, and support for multiple languages.

This mini-project effectively highlights the power of speech recognition technology


and shows how easily accessible tools like Python can be used to create practical
solutions in the realm of multimedia and accessibility.

16
Chapter 8 : Bibliography
SpeechRecognition Library Documentation:
https://pypi.org/project/SpeechRecognition/

Google Cloud Speech-to-Text API:


https://cloud.google.com/speech-to-text

Python Official Documentation:


https://docs.python.org/3/

Tkinter Documentation (Python's Standard GUI Library):


https://docs.python.org/3/library/tk.html

Pydub Library for Audio File Manipulation:


https://pypi.org/project/pydub/

Stack Overflow - Handling Errors in Speech Recognition:


https://stackoverflow.com/questions/41682546/how-to-handle-exceptions-in-
speechrecognition-in-python

Real Python - Introduction to Python GUI Programming with Tkinter:


https://realpython.com/python-gui-tkinter/

Google Speech-to-Text API Quotas and Limits:


https://cloud.google.com/speech-to-text/quotas

Wikipedia - Speech Recognition:


https://en.wikipedia.org/wiki/Speech_recognition

Medium - Build a Speech Recognition App with Python:


https://medium.com/analytics-vidhya/build-a-speech-recognition-application-using-
python-4bff531bb02e

17
Acknowledgement

We have great pleasure in presenting the mini project report on “CAPTION


GENERATOR”.. We take this opportunity to express our sincere thanks towards
our guide Prof. RUCHA PATWARDHAN, Department of Computer
Engineering, Watumull Institute, for providing the technical guidelines and
suggestions regarding line of work. We would like to express our gratitude
towards his constant encouragement, support and guidance throughout the
development of project..

We thank Prof. NILESH K. MEHTA Head of the Department, Computer


Engineering, Watumull Institute for his encouragement during progress meeting
and providing guidelines to write this report.

We thank Prof. AVINASH V. GONDAL, the Principal of Watumull Institute, for


his Valuable suggestions and constant support.

We also thank the entire staff of Watumull Institute of Electronics Engineering


and Computer Technology for their invaluable help rendered during the course of
this work. We wish to express our deep gratitude towards all our colleagues of
Watumull Institute of Electronics Engineering and Computer Technology for their
encouragement.

PRANAV AMALE (01)

ANIKET DHAWALE (11)

MAYURESH DESAI (08)

18

You might also like