Text Summarizer
Synopsis
Submitted by:
Shwetank Verma (19103209)
Ishaan Raj Mishra
(19103210)
Varun Mittal (19103266)
Amritansh Gupta
(19103305)
Department of CSE/IT
JAYPEE INSTITUTE OF INFORMATION TECHNOLOGY
Table of Contents
Page No.
Abstract i
Introduction ii
Background Study iii
Flowchart and Dataset Description iv
References v
ABSTRACT
The amount of text data available has increased dramatically in recent years from a variety of sources.
This large volume of literature has a wealth of information and knowledge that must be adequately
summarized to be useful.
One of the most difficult NLP tasks is summarization, which is the process of generating a shorter
version of a piece of text while keeping critical context information.
The goal is to provide a condensed representation of an input text that captures the original text's basic
meaning.
To produce a condensed version, most successful summarizing systems use extractive algorithms that
crop out and stitch together Chunks of the text.
i
INTRODUCTION
Before producing the required summary texts, machine learning algorithms can be trained to interpret
documents and identify the areas that carry key facts and information.
Summarization improves the readability of publications, cuts down on time spent searching for
information, and allows for more information to be crammed into a given space.
We will be working on extraction-based summarization in this project.
The process of extractive text summarising entails extracting essential terms from the original document
and combining them to create a summary.
Extractive summarization is a type of machine learning that includes weighting the most important parts
of sentences and using the findings to construct summaries.
To determine the weights of the phrases, several algorithms and approaches can be employed to rank them
according to their relevance and resemblance to one another, and then link them to create a summary.
Even though the outcomes of extraction-based summarization aren't always grammatically correct, we
nevertheless get a concise and valuable piece of data.
ii
BACKGROUND STUDY
RESEARCH PAPER 1
TITLE: Analytical study of Text Summarization Techniques
AUTHOR: Dr. Pooja Raundale, Himanshu Shekhar
PUBLISHER: IEEE PUBLISHED IN: October 2021
SUMMARY: They implemented and compared the performance of various automatic summarization
methods to gain insight into how long the methods take to implement and how accurate and human-like
the generated summaries are.
Extractive techniques (TF-IDF and TextRank) achieve very high scores for ROUGE evaluation.
Abstractive techniques like Seq2Seq with Attention and Pointer-Generator score a lot lower as compared
to the above two since they generate human-like summaries that appear to be handwritten.
RESEARCH PAPER 2
TITLE: Extractive Text Summarization Using Sentence Ranking
AUTHOR: J.N. Madhuri, Ganesh Kumar R.
PUBLISHER: IEEE PUBLISHED IN: August 2019
SUMMARY: In this work, they proposed extractive-based text summarization using a statistical novel
approach based on the sentences ranking the sentences selected by the summarizer. The sentences which
are extracted are produced as a summarized text.
The sentences are sorted based on their weighted frequency ranks from highest rank to lowest. The
sentences are arranged in descending order. The summarizer will extract the high-weighted frequency
sentences to find a summary of a document.
iii
FLOWCHART REPRESENTATION
DATASET DESCRIPTION
It contains numerous paragraphs describing various types of medications available and how to consumethem
including the benefits and aftereffects of the medication. It also consists of the doctor’s directionson when to
consume them based on various situations and what to avoid while consuming them
iv
DESCRIPTION OF THE PROJECT
In this project, Automatic text summarization is summarizing the given paragraph using natural language
processing and machine learning. There has been an explosion in the amount of text data from a variety
of sources. This volume of text is an invaluable source of information and knowledge which needs to be
effectively summarized to be useful. In this review, the main approaches to automatic text summarization
are described.
The dataset used in this project contains long descriptions of products. The task is to make a text
summarizer that takes these descriptions as input and summarizes them into shorter versions without
losing the context. The length of the summary will also be adjustable by the user.
There are two general approaches to automatic summarization: Extraction and Abstraction.
Extractive Summarization: These methods rely on extracting several parts, such as phrases and sentences,
from a piece of text and stacking them together to create a summary. Therefore, identifying the right
sentences for summarization is of utmost importance in an extractive method.
Abstractive Summarization: These methods use advanced NLP techniques to generate an entirely new
summary. Some parts of this summary may not even appear in the original text. Such a summary might
include verbal innovations. Research has focused primarily on extractive methods, which are appropriate
for image collection and video summarization.
In this Jupyter notebook, the TextRank algorithm for extractive text summarization is implemented using
Google's PageRank search algorithm to generate correlations among sentences.
Finally, all the generated summary for each paragraph is added to the Dataframe and then the Dataframe
is converted to a CSV file.
v
REFERENCES
1. Luís Gonçalves , Automatic Text Summarization with Machine Learning, Apr
12, 2020
https://medium.com/luisfredgs/automatic-text-summarization- with-machine-
learning-an-overview-68ded5717a25
2. Shrivarsheni, Text Summarization Approaches for NLP, Oct 26 2020
https://www.machinelearningplus.com/nlp/text-summarization- approaches-nlp-
example/
3. Aravindpai, Comprehensive Guide to Text Summarization using Deep Learning
in Python, June 10 2019
https://www.analyticsvidhya.com/blog/2019/06/comprehensive
-guide-text-summarization-using-deep-learning-python/
4. Alfrick Opidi, Gentle Introduction to Text Summarization in Machine
Learning, Apr 15 2019
https://blog.floydhub.com/gentle-introduction-to-text- summarization-
in-machine-learning/
vi